Mucosal microbiota alterations in primary sclerosis cholangitis persist after liver transplantation and are associated with clinical features independently of geography

1. Hypothesis: PSC is associated with the gut microbiome changes

1 Introduction

1.1 About

Pre_LTx vs Post_LTx vs Healthy analysis on merged data

  • Alpha diversity – Richness, Shannon, Simpson, and Pielou indices tested by linear fixed effect or mixed effect model (group effect, country effect, interaction effect)

  • Beta diversity – PERMANOVA (group effect, country effect, interaction effect), PCA visualization

  • Differential abundance testing:

    • o Group effect – linDA + MaAsLin2 intersection

    • o Country effect – taxa with significant interaction effect were excluded based on individual post-hoc analysis. Taxa had to have the same direction in both countries

  • ML models, trained and validated using out-of-sample bootstrap (500 resamplings):

  • o ENET

  • o kNN

  • o GBoost

  • o RF


Pairwise comparisons:

Pre_ltx vs Healthy – an effect of disease, liver cirrhosis and overall bad clinical condition

Pre_ltx vs post_ltx – an effect of transplantation + general improvement of the clinical state

Post_ltx vs Healthy – does LTx lead to „healthy microbiome”?

Report structure The analysis is divided into two segments: terminal ileum and colon. Each part begins with data filtering, where sequencing depth is assessed and nearZeroVar() is applied to remove taxa with no variance in the dataset. This is followed by a series of analytical steps:

  1. Alpha diversity

  2. Beta diversity

  3. Univariate testing

  4. Machine learning models training and validation

At the end of the analysis, a brief summary of the results is provided.

Importing libraries and custom functions built for this analysis

source("custom_functions.R")

1.2 Clinical characteristics

clinical_char <- read.xlsx("../results/clinical_characteristics.xlsx",
                           colNames = FALSE)
colnames(clinical_char) <- clinical_char[3,]
clinical_char <- clinical_char[-c(1,2,3),]
header_groups <- c(" " = 2, "HC" = 2, "pre_LTx" = 2, "post_LTx_non-rPSC"=2, "post_LTx_rPSC"=2)
knitr::kable(clinical_char, align="l", escape=FALSE,digits=3) %>% 
  kable_styling(bootstrap_options = c("hover", "striped", "responsive")) %>% 
  kable_styling(full_width = TRUE) %>%
  add_header_above(header_groups)
HC
pre_LTx
post_LTx_non-rPSC
post_LTx_rPSC
NA Norway Czech Norway Czech Norway Czech Norway Czech
4 F/M [n] 10/30 30/26 19/65 12/19 15/36 26/57 13 3/22
5 Age [years] NA 50 (23;68) 40 (17;77) 35 (17;63) 49 (23;70) 48 (32;81) NA 48 (26;81)
6 IBD yes/no [n] - - 66/18 25/6 31/7 72/11 11/2 24/1
7 Total bilirubin [µmol/l] - 21 (3;68) 24 62 (9;500) 19.5 (11;49) 18 (5;132) 24 (13;52) 24 (6;148)
8 AST [µkat/l] - 0.5 (0.3;1.9) 1.1 (0.2;8.6) 1.6 (0.5;4.0) 0.6 (0.2;3.2) 0.4 (0.2;1.7) 1.1 (0.3, 4.0) 0.5 (0.2;1.9)
9 ALT [µkat/l] - 0.5 (0.3;1.7) 1.6 (0.5;8.6) 1.7 (0.5;5.2) 0.6 (0.1,4.1) 0.5 (0.3;4.0) 1.1 (0.2;4.5) 0.5 (0.3;3.5)
10 ALP [µkat/l] - 1.3 (0.7;2.3) 4.2 (0.9;17.3) 6.0 (1.6;17.8) 1.3 (0.6;7.9) 1.4 (0.6;3.5) 3.7 (0.5;13.9) 2.3 (0.5;13.1)
11 GGT [µkat/l] - 0.4 (0.2;5.0) - 2.6 (0.6;15.2) - 0.4 (0.1; 4.8) - 1.4 (0.3;13.7)
12 INR [-] - 1.1 (1.1;1.2) - 1.1 (0.9;1.8) - 1.1 (0.9;1.6) - 1.1 (1.0;1.7)
13 Creatinine [µmol/l] - 77 (50;115) 65 (39; 232) 60 (44;101) 84 (59; 159) 87 (49;248) 81 (65; 133) 85 (50;780)
14 Albumin [g/l] - 50 (43;57) 41 (26;49) 38 (20;50) 43 (35;49) 45 (26;51) 38 (34; 50) 42 (26;50)
15 Fecal calprotectin [µg/g] - - 59 (1;2844) 123 (6;4513) 30 (1;1945) 275 (6;4821) 55 (10;832) 564 (14;4301)
16 NANCY_max [-] - - - 2 (0;4) - 2 (0;4) - 2 (0;4)
17 eMayo [-] - - - 1 (0;2) - 1 (0;3) - 1 (0;2)
18 Mayo_DAI [-] - - - 2 (0;5) - 1 (0;8) - 2 (0;7)
19 Mayo_PSC risk score [-] - - 0.3 (;2.2;3.5) 0.3 (;2.2;3.5) 0.3 (;1.1;2.2) - 1.0 (;1.4;3.1) -
20 AOM_score [-] - - 1.9 (0.5;3.8) 2 (1.1;5.0) 1.9 (0.8;2.7) - 2.2 (1.2;3.5) -
21 APRI_score [-] - 0.2 (0.1;1.5) 0.7 (0.1;5.5) 1.1 (0.3;14.3) 0.5 (0.2;2.3) 0.3 (0.1;1.5) 0.7 (0.2;3.6) 0.4 (0.1;1.4)
22 FIB-4_score [-] - 0.8 (0.4;3.4) 1.1 (0.8;8.6) 1.6 (0.4;23.1) 1.5 (0.6;3.6) 1.1 (0.3; 5.0) 2.2 (0.5;5.8) 1.1 (0.5;3.1)
23 MELD_score [-] - 7.8 (7.1;8.6) - 8.2 (6.4;18.0) - 8.0 (6.4;14.6) - 7.8 (6.5;22.8)
24 Platelets [10^9/l] - 267 (145;400) 265 (37; 712) 201 (20;477) 197 (101;434) 208 (41;442) 212 (69;409) 213 (83;416)

1.3 Data Import

Importing ASV, taxa and metadata tables for both Czech and Norway samples.

Czech

path = "../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
                                    check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
                                     check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
                                     check.names = FALSE))

Norway

path = "../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
                                    check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
                                    check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
                                    check.names = FALSE))

1.4 Merging

Merging two countries based on the different matrices - Ileum, Colon.

Terminal ileum

ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="TI",Q="Q1")
Removing 1498 ASV(s)
Removing 1834 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]

Reads statistics

summary(colSums(ileum_asv_tab[,-1]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3441   16579   20972   24071   28473   85910 

Colon

colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="colon",Q="Q1")
Removing 739 ASV(s)
Removing 266 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]

Reads statistics

summary(colSums(colon_asv_tab[,-1]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     63   19814   26676   30841   37553  136048 

2 Data Analysis - Terminal ileum

segment="terminal_ileum"

2.1 Filtering

Rules:

  • sequencing depth > 10000

  • nearZeroVar() with default settings

Rarefaction Curve

path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))
load(file.path(path,"rarefaction_ileum.Rdata"))

prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
  theme_bw() +
  theme(axis.text=element_text(size=8), panel.grid=element_blank(),
        strip.background = element_rect(colour=NA,fill="grey"),
        strip.text.x = element_text(face="bold")) + 
  geom_vline(xintercept = seq_depth_threshold, 
             linetype="dashed", 
             color = "red") + 
  xlim(0, 20000)
The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prare

Library size

read_counts(ileum_asv_tab, line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.1 Sequencing depth

data_filt <- seq_depth_filtering(ileum_asv_tab,
                                 ileum_taxa_tab,
                                 ileum_metadata,
                                 seq_depth_threshold = 10000)
Removing 131 ASV(s)
filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata

seq_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_ileum_asv_tab, 
                                   filt_ileum_taxa_tab,
                                   filt_ileum_metadata)

filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.3 Final Counts

final_counts_filtering(ileum_asv_tab,
                       filt_ileum_asv_tab,
                       filt_ileum_metadata,
                       seq_step, 0, nearzero_step) %>% `colnames<-`("Count")
                            Count
Raw data: ASVs               5132
Raw data: Samples             294
Sequencing depth filt: ASVs  5001
Prevalence filt: ASVs           0
NearZeroVar filt: ASVs        479
Filt data: ASVs               479
Filt data: Samples            281
Filt data: Patients           281
Filt data: Patients.1           0
Filtered samples               13
Matrices                       TI
healthy                        73
non-rPSC                        0
rPSC                            0
pre_ltx                        70
post_ltx                      138
ETOH                            0

2.2 Alpha diversity

Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear model.

path = "../results/Q1/alpha_diversity"

Calculation

# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
                                         alpha_ileum_taxa_tab,
                                         alpha_ileum_metadata))

ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
alpha_div_plots <- list()

# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
                         Observe=ileum_mpse$Observe,
                         Shannon=ileum_mpse$Shannon,
                         Simpson=ileum_mpse$Simpson,
                         Pielou=ileum_mpse$Pielou,
                         Group=ileum_mpse$Group,
                         Country=ileum_mpse$Country,
                         Patient=ileum_mpse$Patient)

# save the result
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

2.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

2.2.2 Linear Models

path = "../results/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

# run model
results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_emeans <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_emeans <- NA
}

# save the results
pc_observed <- list(); 
pc_observed[[segment]] <- results_model_observe
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")
Raw results of linear model of richness estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -53.642 11.742 -4.568 0.000 0.000 ***
healthy , pre_ltx - CZ vs NO 11.219 10.488 1.070 0.287 0.361
healthy vs Grouppre_ltx:CountryNO 6.313 15.403 0.410 0.683 0.683
pre_ltx vs Grouppost_ltx 25.623 11.366 2.254 0.025 0.057
pre_ltx , post_ltx - CZ vs NO 17.533 12.615 1.390 0.166 0.249
pre_ltx vs Grouppost_ltx:CountryNO -35.933 15.921 -2.257 0.025 0.057
healthy vs Grouppost_ltx -28.019 9.237 -3.033 0.003 0.012 *
healthy , post_ltx - CZ vs NO 11.219 11.267 0.996 0.321 0.361
healthy vs Grouppost_ltx:CountryNO -29.620 14.629 -2.025 0.044 0.080
knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
post_ltx - pre_ltx CZ 25.623 11.366 204 2.254 0.025
post_ltx - pre_ltx NO -10.310 11.148 204 -0.925 0.356
post_ltx - healthy CZ -28.019 9.237 207 -3.033 0.003
post_ltx - healthy NO -57.639 11.344 207 -5.081 0.000

Shannon

# run model
results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_emeans <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_emeans <- NA
}

# save the results
pc_shannon <- list(); 
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")
Raw results of linear model of Shannon estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.505 0.159 -3.171 0.002 0.017 *
healthy , pre_ltx - CZ vs NO 0.007 0.142 0.046 0.963 0.963
healthy vs Grouppre_ltx:CountryNO 0.144 0.209 0.687 0.493 0.634
pre_ltx vs Grouppost_ltx 0.372 0.148 2.514 0.013 0.049 *
pre_ltx , post_ltx - CZ vs NO 0.150 0.164 0.915 0.361 0.542
pre_ltx vs Grouppost_ltx:CountryNO -0.502 0.207 -2.423 0.016 0.049 *
healthy vs Grouppost_ltx -0.134 0.111 -1.200 0.231 0.416
healthy , post_ltx - CZ vs NO 0.007 0.136 0.048 0.962 0.963
healthy vs Grouppost_ltx:CountryNO -0.358 0.176 -2.031 0.044 0.098
knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
post_ltx - pre_ltx CZ 0.372 0.148 204 2.514 0.013
post_ltx - pre_ltx NO -0.130 0.145 204 -0.898 0.370
post_ltx - healthy CZ -0.134 0.111 207 -1.200 0.231
post_ltx - healthy NO -0.492 0.137 207 -3.597 0.000

Simpson

# run model
results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_emeans <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_emeans <- NA
}


# save the results
pc_simpson <- list(); 
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")
Raw results of linear model of Simpson estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.046 0.019 -2.481 0.014 0.129
healthy , pre_ltx - CZ vs NO -0.008 0.017 -0.466 0.642 0.692
healthy vs Grouppre_ltx:CountryNO 0.027 0.024 1.092 0.277 0.498
pre_ltx vs Grouppost_ltx 0.037 0.021 1.763 0.079 0.238
pre_ltx , post_ltx - CZ vs NO 0.019 0.023 0.811 0.418 0.627
pre_ltx vs Grouppost_ltx:CountryNO -0.060 0.029 -2.033 0.043 0.195
healthy vs Grouppost_ltx -0.009 0.016 -0.572 0.568 0.692
healthy , post_ltx - CZ vs NO -0.008 0.020 -0.396 0.692 0.692
healthy vs Grouppost_ltx:CountryNO -0.033 0.025 -1.303 0.194 0.436
knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
post_ltx - pre_ltx CZ 0.037 0.021 204 1.763 0.079
post_ltx - pre_ltx NO -0.023 0.021 204 -1.106 0.270

Pielou

# run model
results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction

if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_emeans <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_emeans <- NA
}

# save the results
pc_pielou <- list(); 
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")
Raw results of linear model of Pielou estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.040 0.024 -1.708 0.090 0.285
healthy , pre_ltx - CZ vs NO -0.008 0.021 -0.394 0.694 0.943
healthy vs Grouppre_ltx:CountryNO 0.010 0.031 0.332 0.740 0.943
pre_ltx vs Grouppost_ltx 0.042 0.022 1.922 0.056 0.285
pre_ltx , post_ltx - CZ vs NO 0.002 0.024 0.082 0.935 0.943
pre_ltx vs Grouppost_ltx:CountryNO -0.051 0.030 -1.677 0.095 0.285
healthy vs Grouppost_ltx 0.001 0.017 0.072 0.943 0.943
healthy , post_ltx - CZ vs NO -0.008 0.020 -0.407 0.684 0.943
healthy vs Grouppost_ltx:CountryNO -0.041 0.027 -1.525 0.129 0.290
knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
post_ltx - pre_ltx CZ 0.042 0.022 204 1.922 0.056
post_ltx - pre_ltx NO -0.009 0.021 204 -0.435 0.664

2.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

2.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.

2.3.1 Main analysis

Genus level, Aitchison distance

level="genus"
path = "../results/Q1/beta_diversity"
pairwise_aitchison_raw <- list()
pca_plots_list <- list()

Aggregation, filtering

# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             ileum_metadata,
                             seq_depth_threshold=10000)
Removing 5 ASV(s)
filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]
2.3.1.0.1 PERMANOVA
# prepare dataset
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, 
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_ileum_metadata$Group,
                          covariate = filt_ileum_metadata$Country, 
                          interaction = TRUE, sim.method = "robust.aitchison",
                          p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 695.306 4.259 0.029 0.001 0.002 **
pre_ltx vs post_ltx 1 309.768 1.811 0.009 0.003 0.003 **
post_ltx vs healthy 1 914.409 5.454 0.025 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 482.074 2.953 0.020 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 699.075 4.087 0.019 0.001 0.001 ***
post_ltx vs healthy , Country 1 728.020 4.342 0.020 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 237.598 1.460 0.010 0.010 0.030 *
pre_ltx vs post_ltx : Country 1 237.265 1.390 0.007 0.021 0.032 *
post_ltx vs healthy : Country 1 202.888 1.211 0.006 0.106 0.106

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1    275.1 0.02844 1.7272  0.003      0.003
Residual 59   9396.0 0.97156                         
Total    60   9671.1 1.00000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs     R2      F Pr(>F) p.adjusted
Fac       1    657.8 0.0474 3.9806  0.001  0.0013333
Residual 80  13220.8 0.9526                         
Total    81  13878.7 1.0000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Cov       1    362.6 0.03086 2.165  0.001  0.0013333
Residual 68  11387.2 0.96914                        
Total    69  11749.8 1.00000                        

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1    357.1 0.03082 2.2579  0.001  0.0013333
Residual 71  11229.6 0.96918                         
Total    72  11586.7 1.00000                         

$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs     R2      F Pr(>F) p.adjusted
Fac        1    184.3 0.0085 1.0629  0.314      0.314
Residual 124  21497.3 0.9915                         
Total    125  21681.6 1.0000                         

$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1    362.8 0.02649 2.1767  0.001  0.0013333
Residual 80  13332.7 0.97351                         
Total    81  13695.4 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Cov       1    362.6 0.03086 2.165  0.001  0.0013333
Residual 68  11387.2 0.96914                        
Total    69  11749.8 1.00000                        

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    573.8 0.02389 3.3288  0.001  0.0013333
Residual 136  23442.8 0.97611                         
Total    137  24016.6 1.00000                         
2.3.1.0.2 Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend= FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

2.3.1.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

2.3.2 Supplementary analysis

supplements_beta <- list()

2.3.2.1 Genus level

level="genus"
2.3.2.1.1 Bray-Curtis

PERMANOVA

pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, 
                           sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_ileum_metadata$Group,
                          covariate = filt_ileum_metadata$Country, 
                          interaction = TRUE, sim.method = "bray", 
                          p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 0.963 5.058 0.033 0.001 0.002 **
pre_ltx vs post_ltx 1 0.342 1.581 0.007 0.070 0.070
post_ltx vs healthy 1 1.551 7.718 0.034 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 1.132 5.948 0.039 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1.991 9.200 0.042 0.001 0.001 ***
post_ltx vs healthy , Country 1 1.656 8.242 0.037 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 0.449 2.380 0.016 0.002 0.006 **
pre_ltx vs post_ltx : Country 1 0.330 1.527 0.007 0.075 0.075
post_ltx vs healthy : Country 1 0.376 1.878 0.008 0.013 0.020 *

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs     R2      F Pr(>F) p.adjusted
Fac       1   0.4144 0.0363 2.2222  0.006      0.006
Residual 59  11.0035 0.9637                         
Total    60  11.4180 1.0000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   0.9972 0.06156 5.2483  0.001  0.0013333
Residual 80  15.2005 0.93844                         
Total    81  16.1977 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.9344 0.06093 4.4123  0.001  0.0013333
Residual 68  14.4010 0.93907                         
Total    69  15.3354 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.6466 0.05193 3.8893  0.001  0.0013333
Residual 71  11.8030 0.94807                         
Total    72  12.4496 1.00000                         

$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1   0.7698 0.02676 3.7676  0.001      0.001
Residual 137  27.9925 0.97324                         
Total    138  28.7623 1.00000                         

$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs     R2      F Pr(>F) p.adjusted
Fac       1   1.1573 0.0793 6.0291  0.001      0.001
Residual 70  13.4368 0.9207                         
Total    71  14.5941 1.0000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1   1.3859 0.04469 6.3619  0.001      0.001
Residual 136  29.6263 0.95531                         
Total    137  31.0121 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.6466 0.05193 3.8893  0.001      0.001
Residual 71  11.8030 0.94807                         
Total    72  12.4496 1.00000                         

Plots

p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 measure = "bray",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p

2.3.2.1.2 Jaccard

PERMANOVA

pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, 
                           sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_ileum_metadata$Group,
                          covariate = filt_ileum_metadata$Country, 
                          interaction = TRUE, sim.method = "jaccard", 
                          p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1.087 3.929 0.026 0.001 0.002 **
pre_ltx vs post_ltx 1 0.439 1.457 0.007 0.045 0.045 *
post_ltx vs healthy 1 1.633 5.675 0.026 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 1.232 4.452 0.030 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1.965 6.513 0.030 0.001 0.001 ***
post_ltx vs healthy , Country 1 1.751 6.085 0.028 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 0.525 1.912 0.013 0.002 0.006 **
pre_ltx vs post_ltx : Country 1 0.405 1.346 0.006 0.072 0.072
post_ltx vs healthy : Country 1 0.450 1.569 0.007 0.017 0.026 *

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   0.5009 0.03035 1.8467  0.007      0.007
Residual 59  16.0018 0.96965                         
Total    60  16.5027 1.00000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   1.1117 0.04768 4.0054  0.001  0.0013333
Residual 80  22.2037 0.95232                         
Total    81  23.3154 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.9632 0.04568 3.2547  0.001  0.0013333
Residual 68  20.1235 0.95432                         
Total    69  21.0867 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1    0.794 0.04206 3.1176  0.001  0.0013333
Residual 71   18.082 0.95794                         
Total    72   18.876 1.00000                         

$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    0.865 0.02135 2.9894  0.001      0.001
Residual 137   39.627 0.97865                         
Total    138   40.491 1.00000                         

$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   1.2181 0.05806 4.3147  0.001      0.001
Residual 70  19.7616 0.94194                         
Total    71  20.9797 1.00000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.407 0.03294 4.6318  0.001      0.001
Residual 136   41.306 0.96706                         
Total    137   42.713 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1    0.794 0.04206 3.1176  0.001      0.001
Residual 71   18.082 0.95794                         
Total    72   18.876 1.00000                         

Plots

p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 measure = "jaccard",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p

2.3.2.2 ASV level

level="ASV"
2.3.2.2.1 Aitchison

PERMANOVA

# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, 
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_ileum_metadata$Group,
                          covariate = filt_ileum_metadata$Country, 
                          interaction = TRUE, sim.method = "robust.aitchison",
                          p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1010.000 3.749 0.026 0.001 0.002 **
pre_ltx vs post_ltx 1 364.559 1.485 0.007 0.003 0.003 **
post_ltx vs healthy 1 1389.311 5.466 0.025 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 603.397 2.239 0.015 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 671.339 2.734 0.013 0.001 0.001 ***
post_ltx vs healthy , Country 1 789.105 3.105 0.014 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 364.013 1.354 0.009 0.014 0.021 *
pre_ltx vs post_ltx : Country 1 352.491 1.439 0.007 0.007 0.021 *
post_ltx vs healthy : Country 1 281.750 1.109 0.005 0.170 0.170

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1      420 0.02629 1.5929  0.004      0.004
Residual 59    15557 0.97371                         
Total    60    15977 1.00000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1      954 0.04193 3.5009  0.001      0.002
Residual 80    21800 0.95807                         
Total    81    22754 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1    460.2 0.02579 1.8004  0.002  0.0026667
Residual 68  17380.9 0.97421                         
Total    69  17841.1 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1    507.2 0.02476 1.8028  0.001      0.002
Residual 71  19976.2 0.97524                         
Total    72  20483.4 1.00000                         

$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    266.4 0.00872 1.0904  0.221      0.221
Residual 124  30296.1 0.99128                         
Total    125  30562.5 1.00000                         

$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1    450.6 0.02237 1.8308  0.001      0.002
Residual 80  19691.5 0.97763                         
Total    81  20142.1 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1    460.2 0.02579 1.8004  0.002  0.0026667
Residual 68  17380.9 0.97421                         
Total    69  17841.1 1.00000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1      564 0.01699 2.3509  0.001      0.002
Residual 136    32607 0.98301                         
Total    137    33170 1.00000                         

PCoA

p <- pca_plot_custom(filt_ileum_asv_tab,
                           filt_ileum_taxa_tab,
                           filt_ileum_metadata,
                           show_boxplots = TRUE,
                           variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p

# see the results
p

2.3.2.2.2 Bray-Curtis

PERMANOVA

# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, 
                           sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_ileum_metadata$Group,
                          covariate = filt_ileum_metadata$Country, 
                          interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1.628 5.225 0.035 0.001 0.002 **
pre_ltx vs post_ltx 1 0.519 1.563 0.007 0.024 0.024 *
post_ltx vs healthy 1 2.428 7.667 0.035 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 1.136 3.644 0.024 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1.676 5.049 0.024 0.001 0.001 ***
post_ltx vs healthy , Country 1 1.667 5.264 0.024 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 0.469 1.511 0.010 0.027 0.040 *
pre_ltx vs post_ltx : Country 1 0.562 1.698 0.008 0.010 0.030 *
post_ltx vs healthy : Country 1 0.432 1.367 0.006 0.064 0.064

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   0.5756 0.02993 1.8204  0.005      0.005
Residual 59  18.6557 0.97007                         
Total    60  19.2313 1.00000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Fac       1   1.5215 0.05848 4.969  0.001  0.0013333
Residual 80  24.4953 0.94152                        
Total    81  26.0168 1.00000                        

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.8717 0.03713 2.6224  0.001  0.0013333
Residual 68  22.6028 0.96287                         
Total    69  23.4744 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1    0.733 0.03444 2.5326  0.001  0.0013333
Residual 71   20.548 0.96556                         
Total    72   21.281 1.00000                         

$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    0.430 0.01027 1.2865  0.114      0.114
Residual 124   41.443 0.98973                         
Total    125   41.873 1.00000                         

$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   0.6509 0.02437 1.9985  0.004  0.0053333
Residual 80  26.0569 0.97563                         
Total    81  26.7079 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.8717 0.03713 2.6224  0.001      0.002
Residual 68  22.6028 0.96287                         
Total    69  23.4744 1.00000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.366 0.02953 4.1389  0.001      0.002
Residual 136   44.897 0.97047                         
Total    137   46.263 1.00000                         

PCoA

p <- pca_plot_custom(filt_ileum_asv_tab,
                     filt_ileum_taxa_tab,
                     filt_ileum_metadata,
                     measure = "bray",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p

2.3.2.2.3 Jaccard

PERMANOVA

# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, 
                           sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_ileum_metadata$Group,
                          covariate = filt_ileum_metadata$Country, 
                          interaction = TRUE, sim.method = "jaccard",
                          p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1.354 3.531 0.024 0.001 0.002 **
pre_ltx vs post_ltx 1 0.528 1.327 0.006 0.024 0.024 *
post_ltx vs healthy 1 1.921 4.964 0.023 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 0.994 2.593 0.018 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1.384 3.480 0.017 0.001 0.001 ***
post_ltx vs healthy , Country 1 1.396 3.607 0.017 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 0.514 1.344 0.009 0.020 0.030 *
pre_ltx vs post_ltx : Country 1 0.571 1.439 0.007 0.010 0.030 *
post_ltx vs healthy : Country 1 0.488 1.262 0.006 0.041 0.041 *

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   0.5659 0.02422 1.4643  0.006      0.006
Residual 59  22.8020 0.97578                         
Total    60  23.3680 1.00000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1    1.302 0.04112 3.4309  0.001  0.0013333
Residual 80   30.352 0.95888                         
Total    81   31.654 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Cov       1   0.7898 0.02836 1.985  0.001  0.0013333
Residual 68  27.0550 0.97164                        
Total    69  27.8447 1.00000                        

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.7183 0.02679 1.9541  0.001  0.0013333
Residual 71  26.0995 0.97321                         
Total    72  26.8178 1.00000                         

$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    0.468 0.00938 1.1736  0.115      0.115
Residual 124   49.489 0.99062                         
Total    125   49.957 1.00000                         

$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1     0.63 0.01964 1.6025  0.003      0.004
Residual 80    31.47 0.98036                         
Total    81    32.10 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Cov       1   0.7898 0.02836 1.985  0.001      0.002
Residual 68  27.0550 0.97164                        
Total    69  27.8447 1.00000                        

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.165 0.02116 2.9397  0.001      0.002
Residual 136   53.904 0.97884                         
Total    137   55.069 1.00000                         

$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    1.058 0.01938 2.7075  0.001      0.001
Residual 137   53.521 0.98062                         
Total    138   54.578 1.00000                         

$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   1.3507 0.04853 3.5701  0.001      0.001
Residual 70  26.4826 0.95147                         
Total    71  27.8333 1.00000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.165 0.02116 2.9397  0.001      0.001
Residual 136   53.904 0.97884                         
Total    137   55.069 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.7183 0.02679 1.9541  0.001      0.001
Residual 71  26.0995 0.97321                         
Total    72  26.8178 1.00000                         

PCoA

p <- pca_plot_custom(filt_ileum_asv_tab,
                     filt_ileum_taxa_tab,
                     filt_ileum_metadata,
                     measure = "jaccard",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p

2.3.2.3 Saving results

write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
           file = file.path(path,
           paste0("supplements_beta_diversity_", segment,".xlsx")))

2.4 Univariate Analysis

Using two DAA tools - linDA and Maaslin. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.

Main analysis is performed at genus level. In supplements, analysis is performed at ASV and phylum levels.

2.4.1 Main analysis

level="genus"

# needed paths
path = "../results/Q1/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q1",level)
# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list(); 
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
psc_effect <- list()

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
                                                  ileum_genus_taxa_tab)

2.4.1.1 pre_ltx vs healthy

We can assume, that the differentially abundant taxa between these two groups are caused by PSC disease and other factors, like liver cirrhosis and overall bad clinical condition.

group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.1.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 84 ASV(s)
Removing 10 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  143  samples and  184  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plots
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.1.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.2 pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 46 ASV(s)
Removing 6 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  208  samples and  172  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
2.4.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.2.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.2.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.3 post_ltx vs healthy

We can assume, that the differentially abundant taxa between these two groups remain because of the persistent gut MB alteration due to the PSC disease. Also, transplantation itself could add to this difference, as it has definitely strong impact on microbial composition.

group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 45 ASV(s)
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  211  samples and  180  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
2.4.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.3.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab) + xlab("") + ylab("")
min_clr -1.767585 
max_clr 6.908912 
min_log -5.062147 
max_log 4.892498 
dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.09))

p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),
               p_prevalence_final,
               ncol=2,widths = c(1,0.3))
Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_text()`).
p

dot_heatmap_ileum <- p

2.4.1.5 PSC effect

To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).

pre_LTx vs HC and Post_LTx vs HC intersection

imagename
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]] 
                           SeqID
10                 Enterorhabdus
16                   Barnesiella
19                 Butyricimonas
20                   Odoribacter
27                     Alistipes
29               Parabacteroides
41                    Holdemania
64                   Coprococcus
68              Fusicatenibacter
72             Lachnoclostridium
74  Lachnospiraceae_FCS020_group
105                Oscillibacter
118             Faecalibacterium
155                  Pseudomonas
164                 Enterococcus
                                                                                                        Taxonomy
10     k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;g__Enterorhabdus
16                 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
19                k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
20                  k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27                     k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
29              k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
41               k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
64                   k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
68              k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
72             k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
74  k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
105              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
118            k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium
155   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
164                   k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
    log2FoldChange      p_value         padj
10       -1.050549 1.087023e-02 4.545734e-02
16       -3.177161 1.263665e-03 7.750481e-03
19       -2.900425 6.571752e-05 9.301556e-04
20       -2.971657 3.696980e-05 6.184040e-04
27       -2.559742 5.731016e-04 4.336862e-03
29       -3.418185 1.366753e-05 2.794251e-04
41       -1.213660 1.082333e-02 4.545734e-02
64       -3.868534 6.688508e-07 3.076714e-05
68       -2.864295 5.224510e-04 4.336862e-03
72       -1.781011 1.555333e-03 9.231654e-03
74       -2.231945 8.971276e-04 6.113758e-03
105      -1.970465 2.772265e-03 1.416936e-02
118      -3.360183 8.947299e-06 2.351861e-04
155       1.573079 8.758249e-03 4.028794e-02
164       4.892498 4.554602e-10 8.380467e-08

2.4.1.6 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

2.4.2 Supplementary Analysis

supplements_uni <- list()
supplements_wb <- createWorkbook()

2.4.2.1 ASV level

level="ASV"
path_maaslin="../intermediate_files/maaslin/Q1/ASV/"
raw_linda_results <- list();
raw_linda_results[[segment]] <- list()
linda_results <- list(); 
linda_results[[segment]] <- list()
2.4.2.1.1 pre_ltx vs healthy
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 1598 ASV(s)
Removing 146 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  143  samples and  453  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
2.4.2.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 979 ASV(s)
Removing 68 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  208  samples and  349  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
2.4.2.1.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 641 ASV(s)
Removing 104 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  211  samples and  444  features will be tested!
Pseudo-count approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
2.4.2.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab)
min_clr -1.065303 
max_clr 4.679688 
min_log -5.165415 
max_log 4.224397 
dotheatmap_linda

2.4.2.1.5 PSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

imagename
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]] 
                                                                                                                                                                                                                                                                                                                                                                                                               SeqID
22  TAGGGAATTTTCGGCAATGGGGGAAACCCTGACCGAGCAACGCCGCGTGAAGGAAGAAGTAATTCGTTATGTAAACTTCTGTCATAGAGGAAGAACGGTGGATATAGGGAATGATATCCAAGTGACGGTACTCTATAAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCGAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCACTAAGGGTCTGTGGTGAAAGATCGAAGCTTAACTTCGGTAAGCCATGGAAACCGTAGAGCTAGAGTGTGTGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCGCATAACTGACGCTC
39  TGAGGAATATTGGGCAATGGAGGAAACTCTGACCCAGCCATGCCGCGTGAGTGAAGAAGGTTTTCGAATTGTAAAGCTCTTTCGGATGGGACGATGATGACGGTACCATCTAAAGAAGCCCCGGCAAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGTGTGTAGGCGGATTTGTAAGATAGTGGTGAAATACCTGAGCTCAACTTAGGAATTGCCATTATAACTATAGATCTGGAGTGACAGAGAGGATATTGGAATACCCAGTGTAGAGGTGAAATTCGTAGATATTGGGTAGAACACCAGTGGCGAAGGCGAGTATCTGGCTGTCAACTGACGCTGAGGCACGAAAGCATGGGGATCAAA
70  TGAGGAATATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCGTGCAGGATGACGGCTCTATGAGTTGTAAACTGCTTTTGTACGAGGGTAAACGCAGATACGTGTATCTGTCTGAAAGTATCGTACGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTGATAAGTTAGAGGTGAAATTTCGGGGCTCAACCCTGAACGTGCCTCTAATACTGTTGAGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAACTATATCTGACGTTGAGGC
83  TGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGAAGGATGAAGGCTCTATGGGTCGTAAACTTCTTTTATATGGGAATAAAGTTTTCCACGTGTGGAATTTTGTATGTACCATATGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGAAACTGGCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTAGACTGTCACTGACACTGATGC
133 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGACGGCACTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTAGAGCTAGAGTGCTGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTGCTGGACAGTAACTGACGTTCAGGCTCGAAAGCGTGGGGAGCAAA
134 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCTGTGCAAGTCTGAAGTGAAAGGCATGGGCTCAACCTGTGGACTGCTTTGGAAACTGTGCAGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
139 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGACGGCGATGCAAGTCTGGAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTATGGCTAGAGTGCTGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTGCTGGACAGTAACTGACGTTCAGGCTCGAAAGCGTGGGGAGCAAA
140 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAAGCAAGTCTGAAGTGAAAACCCAGGGCTCAACCCTGGGACTGCTTTGGAAACTGTTTTGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
141 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTAAAGCAAGTCTGAAGTGAAAGCCCGCGGCTCAACTGCGGGACTGCTTTGGAAACTGTTTAACTGGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGACTTACTGGACGATAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
166 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAGCGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGAATGGCTTTGGAAACTGTGCAGCTAGAGTACCGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
169 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAGACAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGCCTTGCTAGAGTGCTGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
196 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCATTGCAAGCCAGATGTGAAAGCCCGGGGCTCAACCCCGGGACTGCATTTGGAACTGTAGAGCTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
198 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTATGGCAAGTCTGATGTGAAAGGCCAGGGCTCAACCCTGGGACTGCATTGGAAACTGTCGAACTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
207 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCACCGGCTAAATACGTGCCAGCAGCCGCGGTAATACGTATGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCTGTGTAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTATGCAGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCCAGTGTAGCGGTGAAATGCGTAGATATTGGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
212 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAGGCAAGTCTGATGTGAAAACCCGGGGCTCAACCCCGTGACTGCATTGGAAACTGTTTTGCTTGAGTGCCGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGGCAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
220 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGATAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAAGGCAAGTCTGATGTGAAAACCCAGGGCTTAACCCTGGGACTGCATTGGAAACTGTCTGGCTCGAGTGCCGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
240 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTCAAGCAAGTCAGAAGTGAAAGGCTGGGGCTCAACCCCGGGACTGCTTTTGAAACTGTTTGACTGGAGTGCTGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
270 TGGGGAATATTGCGCAATGGGGGAAACCCTGACGCAGCAACGCCGCGTGATTGAAGAAGGCCTTCGGGTTGTAAAGATCTTTAATCAGGGACGAAACAAATGACGGTACCTGAAGAATAAGCTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGCGCAGGCGGGCCGGTAAGTTGGAAGTGAAATCTATGGGCTTAACCCATAAACTGCTTTCAAAACTGCTGGTCTTGAGTGATGGAGAGGCAGGCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACATTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGC
283 TGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTGTCAGGGAAGAGTAGAAGACGGTACCTGACGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGGCCGGCAAGTCAGATGTGAAATCTGGAGGCTTAACCTCCAAACTGCATTTGAAACTGTAGGTCTTGAGTACCGGAGAGGTTATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATAAGGAAGAACACCAGTGGCGAAGGCGGATAACTGGACGGCAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCA
301 TGGGGAATATTGGGCAATGGGCGCAAGCCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTATGAGGGACGAAGGAAGTGACGGTACCTCATGAATAAGCTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCGAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGGAAGCAAGTCAGATGTGAAAACCAGTGGCTCAACCACTGGCCTGCATTTGAAACTGTTTTTCTTGAGTGATGGAGAGGCAGGCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACATTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGC
305 TGGGGAATATTGGGCAATGGGCGCAAGCCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTCTCAGGGACGAACAAATGACGGTACCTGAGGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGAAGGCAAGTCAGATGTGAAAACTATGGGCTCAACCCATAGCCTGCATTTGAAACTGTTTTTCTTGAGTGCTGGAGAGGCAATCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGATTGCTGGACAGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCA
338 TGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGACGGCCTTCGGGTTGTAAAGCTCTGTTAATCGGGACGAAAGGCCTTCTTGCGAATAGTGAGAAGGATTGACGGTACCGGAATAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATAGGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCCAATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCT
374 TGGGGGATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGACGCAAGTCTGAAGTGAAATACCCGGGCTCAACCTGGGAACTGCTTTGGAAACTGTGTTGCTAGAGTGCTGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
401 TAGGGAATCTTCGGCAATGGACGAAAGTCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTTAGAGAAGAACAAGGATGAGAGTAACTGTTCATCCCTTGACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAAGAGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGTAACTGACGCT
426 TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGCGGGATGGAGGCCTTCGGGTTGTAAACCGCTTTTGATCGGGAGCAAGCCCTTCGGGGTGAGTGTACCTTTCGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTATCCGGAATTATTGGGCGTAAAGGGCTCGTAGGCGGTTCGTCGCGTCCGGTGTGAAAGCCCATCGCTTAACGGTGGGTCTGCGCCGGGTACGGGCGGGCTGGAGTGCGGTAGGGGAGACTGGAATTCCCGGTGTAACGGTGGAATGTGTAGATATCGGGAAGAACACCAATGGCGAAGGCAGGTCTCTGGGCCGTCACTGACGCTGAGGAGCGAAAGCGTG
                                                                                                                                Taxonomy
22  k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelatoclostridiaceae;g__Erysipelotrichaceae_UCG-003;s__unassigned
39                    k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodospirillales;f__uncultured;g__uncultured;s__unassigned
70                               k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes;s__unassigned
83                            k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__unassigned
133                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
134                                  k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
139                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
140                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
141                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
166                             k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
169                             k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
196                                  k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
198                             k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
207                            k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus;s__unassigned
212                                 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__CAG-56;s__unassigned
220                       k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter;s__unassigned
240                             k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
270                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Butyricicoccaceae;g__Butyricicoccus;s__unassigned
283                        k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter;s__unassigned
301                           k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__uncultured;s__unassigned
305                       k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas;s__unassigned
338         k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella;s__unassigned
374                k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_UCG-004;s__unassigned
401                             k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus;s__unassigned
426        k__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Bifidobacteriales;f__Bifidobacteriaceae;g__Bifidobacterium;s__unassigned
    log2FoldChange      p_value         padj
22      -4.4298696 1.676032e-06 1.265404e-04
39      -2.0538259 2.203655e-03 2.697988e-02
70      -3.5115154 1.659708e-04 4.699048e-03
83      -1.9687275 1.190853e-04 4.149666e-03
133     -5.1654153 4.258197e-07 3.857926e-05
134     -4.4957248 1.521108e-08 3.757406e-06
139     -2.9269443 4.470062e-04 8.099752e-03
140      3.8241704 8.725084e-06 4.391626e-04
141      3.2182184 1.288509e-04 4.169248e-03
166     -2.5678980 8.934631e-04 1.349129e-02
169     -5.0632949 2.091495e-07 2.368618e-05
196     -1.7469483 3.668450e-03 3.661578e-02
198     -2.4204302 2.028649e-04 5.105433e-03
207     -5.0455471 1.658899e-08 3.757406e-06
212     -3.0907151 1.418501e-03 1.835946e-02
220     -2.8612176 8.567099e-04 1.349129e-02
240     -2.1804819 8.890519e-04 1.349129e-02
270     -2.1661995 3.862513e-03 3.722805e-02
283     -2.8158488 2.758436e-05 1.135974e-03
301     -2.1038157 4.028492e-03 3.801889e-02
305     -1.9220557 1.012676e-03 1.433569e-02
338      2.3054457 3.730015e-04 7.756897e-03
374     -2.9436340 6.863898e-05 2.591121e-03
401      3.9907909 3.116171e-08 4.705418e-06
426      0.8645068 5.884295e-03 4.759975e-02

2.4.2.2 Phylum level

level="phylum"
path_maaslin="../intermediate_files/maaslin/Q1/Phylum/"
raw_linda_results_phylum <- list();
raw_linda_results_phylum[[segment]] <- list()
linda_results_phylum <- list(); 
linda_results_phylum[[segment]] <- list()

Aggregate taxa

phylum_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = "Phylum")

ileum_phylum_tab <- phylum_data[[1]]
ileum_phylum_taxa_tab <- phylum_data[[2]]
2.4.2.2.1 pre_ltx vs healthy
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  143  samples and  10  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
2.4.2.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  208  samples and  10  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
2.4.2.2.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 1 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  211  samples and  10  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
2.4.2.2.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab)
min_clr -4.119052 
max_clr 3.1473 
min_log -2.944293 
max_log 2.82227 
dotheatmap_linda

2.4.2.2.5 PSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

imagename
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]] 
[1] SeqID          Taxonomy       log2FoldChange p_value        padj          
<0 rows> (or 0-length row.names)

2.4.2.3 Saving results

# ALL DATA
saveWorkbook(supplements_wb,file.path(path,paste0("supplements_uni_analysis_wb_",segment,".xlsx")),overwrite = TRUE)

# PSC effect
write.xlsx(psc_effect,
          file.path(path,paste0("supplements_psc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("supplements_significant_taxa_",segment,".xlsx")))

2.5 Machine learning

Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.

The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.

path = "../results/Q1/models"

2.5.1 ENET

model="enet"

2.5.1.1 ASV level

level="ASV"
2.5.1.1.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ <- list()
models_cm <- list()
betas <- list()
roc_cs <- list()

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.0256910
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.9562938
auc_optimism_corrected_CIL      0.8940903
auc_optimism_corrected_CIU      0.9955143
accuracy                        1.0000000
accuracy_czech                        NaN
accuracy_no                     1.0000000
accuracy_optimism_corrected     0.8824810
accuracy_optimism_corrected_CIL 0.7808636
accuracy_optimism_corrected_CIU 0.9600000
enet_model$conf_matrices
$original
    Predicted
True  0  1
   0 73  0
   1  0 70

$czech
    Predicted
True  0  1
   0 37  0
   1  0 24

$no
    Predicted
True  0  1
   0 36  0
   1  0 46
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.1292328
auc                             0.9967909
auc_czech                       0.9979575
auc_no                          0.9969807
auc_optimism_corrected          0.8645187
auc_optimism_corrected_CIL      0.7852767
auc_optimism_corrected_CIU      0.9318161
accuracy                        0.9278846
accuracy_czech                        NaN
accuracy_no                     0.9512195
accuracy_optimism_corrected     0.8220597
accuracy_optimism_corrected_CIL 0.7333333
accuracy_optimism_corrected_CIU 0.8924940
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 138   0
   1  15  55

$czech
    Predicted
True   0   1
   0 102   0
   1  11  13

$no
    Predicted
True  0  1
   0 36  0
   1  4 42
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.1.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.20000000
lambda                          0.04799701
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.95799961
auc_optimism_corrected_CIL      0.89999175
auc_optimism_corrected_CIU      0.98977781
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.89383497
accuracy_optimism_corrected_CIL 0.82118019
accuracy_optimism_corrected_CIU 0.96028870
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0  73   0
   1   0 138

$czech
    Predicted
True   0   1
   0  37   0
   1   0 102

$no
    Predicted
True  0  1
   0 36  0
   1  0 36
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.2 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
2.5.1.2.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.20000000
lambda                          0.03984448
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96582239
auc_optimism_corrected_CIL      0.91575521
auc_optimism_corrected_CIU      0.99568966
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.89754798
accuracy_optimism_corrected_CIL 0.80769231
accuracy_optimism_corrected_CIU 0.97141139
enet_model$conf_matrices
$original
    Predicted
True  0  1
   0 73  0
   1  0 70

$czech
    Predicted
True  0  1
   0 37  0
   1  0 24

$no
    Predicted
True  0  1
   0 36  0
   1  0 46
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.1.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.0000000
lambda                          0.6674012
auc                             0.9726708
auc_czech                       0.9616013
auc_no                          0.9800725
auc_optimism_corrected          0.8344709
auc_optimism_corrected_CIL      0.7487803
auc_optimism_corrected_CIU      0.9154473
accuracy                        0.8653846
accuracy_czech                        NaN
accuracy_no                     0.8780488
accuracy_optimism_corrected     0.7788698
accuracy_optimism_corrected_CIL 0.6973684
accuracy_optimism_corrected_CIU 0.8516141
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 136   2
   1  26  44

$czech
    Predicted
True   0   1
   0 101   1
   1  17   7

$no
    Predicted
True  0  1
   0 35  1
   1  9 37
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.1.2.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,group, 
                                     usage="ml_clr")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.07097722
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96566611
auc_optimism_corrected_CIL      0.92200341
auc_optimism_corrected_CIU      0.99386451
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.90031834
accuracy_optimism_corrected_CIL 0.82800926
accuracy_optimism_corrected_CIU 0.96054545
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0  73   0
   1   0 138

$czech
    Predicted
True   0   1
   0  37   0
   1   0 102

$no
    Predicted
True  0  1
   0 36  0
   1  0 36
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.1.3 Saving results

models_summ_df_ileum <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))

2.5.2 Supplementary models

supplements_models <- list()

2.5.2.1 CLR-transformed data

2.5.2.1.1 kNN
model="knn"
2.5.2.1.1.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               18.0000000
auc                              0.9239726
auc_optimism_corrected           0.8707255
auc_optimism_corrected_CIL       0.7611443
auc_optimism_corrected_CIU       0.9538139
accuracy                         0.7762238
accuracy_optimism_corrected      0.7606251
accuracy_optimism_corrected_CIL  0.6400000
accuracy_optimism_corrected_CIU  0.8714046
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               16.0000000
auc                              0.8342650
auc_optimism_corrected           0.6861098
auc_optimism_corrected_CIL       0.5546110
auc_optimism_corrected_CIU       0.7936124
accuracy                         0.7067308
accuracy_optimism_corrected      0.6707627
accuracy_optimism_corrected_CIL  0.5697674
accuracy_optimism_corrected_CIU  0.7682297
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               27.0000000
auc                              0.9100159
auc_optimism_corrected           0.8695718
auc_optimism_corrected_CIL       0.7804090
auc_optimism_corrected_CIU       0.9534194
accuracy                         0.8720379
accuracy_optimism_corrected      0.8197023
accuracy_optimism_corrected_CIL  0.7302235
accuracy_optimism_corrected_CIU  0.8972568
roc_c

2.5.2.1.1.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               12.0000000
auc                              0.9568493
auc_optimism_corrected           0.8867898
auc_optimism_corrected_CIL       0.7995277
auc_optimism_corrected_CIU       0.9600000
accuracy                         0.7692308
accuracy_optimism_corrected      0.7505565
accuracy_optimism_corrected_CIL  0.6074755
accuracy_optimism_corrected_CIU  0.8600000
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               17.0000000
auc                              0.8493789
auc_optimism_corrected           0.7274604
auc_optimism_corrected_CIL       0.6082378
auc_optimism_corrected_CIU       0.8332776
accuracy                         0.7692308
accuracy_optimism_corrected      0.6961743
accuracy_optimism_corrected_CIL  0.5961806
accuracy_optimism_corrected_CIU  0.7950340
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               12.0000000
auc                              0.9638674
auc_optimism_corrected           0.9214737
auc_optimism_corrected_CIL       0.8567085
auc_optimism_corrected_CIU       0.9739614
accuracy                         0.8530806
accuracy_optimism_corrected      0.7997011
accuracy_optimism_corrected_CIL  0.6820732
accuracy_optimism_corrected_CIU  0.8987821
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.2.1.2 RF
model="rf"
2.5.2.1.2.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "105"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.946523" 
auc_optimism_corrected_CIL      "0.8726572"
auc_optimism_corrected_CIU      "0.9949915"
accuracy                        "1"        
accuracy_optimism_corrected     "0.868117" 
accuracy_optimism_corrected_CIL "0.7609048"
accuracy_optimism_corrected_CIU "0.960024" 
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "115"      
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.8555864"
auc_optimism_corrected_CIL      "0.7560028"
auc_optimism_corrected_CIU      "0.941253" 
accuracy                        "1"        
accuracy_optimism_corrected     "0.8330184"
accuracy_optimism_corrected_CIL "0.7464789"
accuracy_optimism_corrected_CIU "0.9102992"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "115"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.8996574"
auc_optimism_corrected_CIL      "0.8172801"
auc_optimism_corrected_CIU      "0.9705953"
accuracy                        "1"        
accuracy_optimism_corrected     "0.874331" 
accuracy_optimism_corrected_CIL "0.7870625"
accuracy_optimism_corrected_CIU "0.9412131"
roc_c

2.5.2.1.2.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "47"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9495441"
auc_optimism_corrected_CIL      "0.8771007"
auc_optimism_corrected_CIU      "0.9959453"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8764871"
accuracy_optimism_corrected_CIL "0.7804167"
accuracy_optimism_corrected_CIU "0.9615385"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "57"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.8244378"
auc_optimism_corrected_CIL      "0.7188584"
auc_optimism_corrected_CIU      "0.9062878"
accuracy                        "1"        
accuracy_optimism_corrected     "0.79629"  
accuracy_optimism_corrected_CIL "0.7037701"
accuracy_optimism_corrected_CIU "0.8726469"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "133"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9117739"
auc_optimism_corrected_CIL      "0.8210604"
auc_optimism_corrected_CIU      "0.978767" 
accuracy                        "1"        
accuracy_optimism_corrected     "0.8682508"
accuracy_optimism_corrected_CIL "0.7820513"
accuracy_optimism_corrected_CIU "0.9421487"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.2.1.3 Gradient boosting
model="gb"
2.5.2.1.3.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9548905
auc_optimism_corrected_CIL        0.9022039
auc_optimism_corrected_CIU        0.9919377
accuracy                          1.0000000
accuracy_optimism_corrected       0.8793789
accuracy_optimism_corrected_CIL   0.7884615
accuracy_optimism_corrected_CIU   0.9604118
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               0.9998965
auc_optimism_corrected            0.8678054
auc_optimism_corrected_CIL        0.7868984
auc_optimism_corrected_CIU        0.9351204
accuracy                          0.9903846
accuracy_optimism_corrected       0.8340986
accuracy_optimism_corrected_CIL   0.7482500
accuracy_optimism_corrected_CIU   0.9085227
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9250994
auc_optimism_corrected_CIL        0.8522212
auc_optimism_corrected_CIU        0.9818952
accuracy                          1.0000000
accuracy_optimism_corrected       0.8766714
accuracy_optimism_corrected_CIL   0.8050354
accuracy_optimism_corrected_CIU   0.9358974
roc_c

2.5.2.1.3.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                             reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9727397
auc_optimism_corrected_CIL        0.9275174
auc_optimism_corrected_CIU        1.0000000
accuracy                          1.0000000
accuracy_optimism_corrected       0.9037504
accuracy_optimism_corrected_CIL   0.8155329
accuracy_optimism_corrected_CIU   0.9719540
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               0.9909938
auc_optimism_corrected            0.8303197
auc_optimism_corrected_CIL        0.7403750
auc_optimism_corrected_CIU        0.9115621
accuracy                          0.9278846
accuracy_optimism_corrected       0.7987144
accuracy_optimism_corrected_CIL   0.7053850
accuracy_optimism_corrected_CIU   0.8767736
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9369762
auc_optimism_corrected_CIL        0.8678590
auc_optimism_corrected_CIU        0.9834064
accuracy                          1.0000000
accuracy_optimism_corrected       0.8813001
accuracy_optimism_corrected_CIL   0.8133333
accuracy_optimism_corrected_CIU   0.9448440
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.2.2 Relative abundances

2.5.2.2.1 Elastic net
2.5.2.2.1.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           1.00000000
lambda                          0.06282139
auc                             0.99726027
auc_czech                       0.99662162
auc_no                          0.99818841
auc_optimism_corrected          0.92410231
auc_optimism_corrected_CIL      0.83677721
auc_optimism_corrected_CIU      0.97242424
accuracy                        0.93706294
accuracy_czech                         NaN
accuracy_no                     0.93902439
accuracy_optimism_corrected     0.85997837
accuracy_optimism_corrected_CIL 0.77602041
accuracy_optimism_corrected_CIU 0.91766690
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.0000000
lambda                          3.1074497
auc                             0.9894410
auc_czech                       0.9950980
auc_no                          0.9734300
auc_optimism_corrected          0.7340394
auc_optimism_corrected_CIL      0.6170852
auc_optimism_corrected_CIU      0.8336118
accuracy                        0.7259615
accuracy_czech                        NaN
accuracy_no                     0.5243902
accuracy_optimism_corrected     0.6974512
accuracy_optimism_corrected_CIL 0.5967411
accuracy_optimism_corrected_CIU 0.8050044
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.0941553
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.8878147
auc_optimism_corrected_CIL      0.8182230
auc_optimism_corrected_CIU      0.9395208
accuracy                        0.9620853
accuracy_czech                        NaN
accuracy_no                     0.9861111
accuracy_optimism_corrected     0.8346693
accuracy_optimism_corrected_CIL 0.7516067
accuracy_optimism_corrected_CIU 0.9093405
roc_c

2.5.2.2.1.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.1357028
auc                             0.9982387
auc_czech                       0.9954955
auc_no                          1.0000000
auc_optimism_corrected          0.9386177
auc_optimism_corrected_CIL      0.8965000
auc_optimism_corrected_CIU      0.9793945
accuracy                        0.9860140
accuracy_czech                        NaN
accuracy_no                     1.0000000
accuracy_optimism_corrected     0.8663045
accuracy_optimism_corrected_CIL 0.8016509
accuracy_optimism_corrected_CIU 0.9358844
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.40000000
lambda                          0.08652139
auc                             0.94720497
auc_czech                       0.95751634
auc_no                          0.88828502
auc_optimism_corrected          0.73722094
auc_optimism_corrected_CIL      0.65355232
auc_optimism_corrected_CIU      0.81212375
accuracy                        0.82211538
accuracy_czech                         NaN
accuracy_no                     0.73170732
accuracy_optimism_corrected     0.71869940
accuracy_optimism_corrected_CIL 0.61986111
accuracy_optimism_corrected_CIU 0.79628473
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.1352860
auc                             0.9857058
auc_czech                       0.9809221
auc_no                          0.9915123
auc_optimism_corrected          0.8571252
auc_optimism_corrected_CIL      0.8014650
auc_optimism_corrected_CIU      0.8939396
accuracy                        0.9099526
accuracy_czech                        NaN
accuracy_no                     0.9583333
accuracy_optimism_corrected     0.7924925
accuracy_optimism_corrected_CIL 0.7205995
accuracy_optimism_corrected_CIU 0.8558642
roc_c

2.5.2.2.2 kNN
2.5.2.2.2.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9155577
auc_optimism_corrected           0.8288216
auc_optimism_corrected_CIL       0.6990440
auc_optimism_corrected_CIU       0.9324125
accuracy                         0.6993007
accuracy_optimism_corrected      0.6665810
accuracy_optimism_corrected_CIL  0.5590566
accuracy_optimism_corrected_CIU  0.7549259
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               20.0000000
auc                              0.8137164
auc_optimism_corrected           0.6656270
auc_optimism_corrected_CIL       0.5828798
auc_optimism_corrected_CIU       0.7357973
accuracy                         0.6778846
accuracy_optimism_corrected      0.6814455
accuracy_optimism_corrected_CIL  0.6193304
accuracy_optimism_corrected_CIU  0.7416264
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               15.0000000
auc                              0.8773576
auc_optimism_corrected           0.8241731
auc_optimism_corrected_CIL       0.7426389
auc_optimism_corrected_CIU       0.9093236
accuracy                         0.8578199
accuracy_optimism_corrected      0.7672256
accuracy_optimism_corrected_CIL  0.6851333
accuracy_optimism_corrected_CIU  0.8682099
roc_c

2.5.2.2.2.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.9076321
auc_optimism_corrected           0.8187868
auc_optimism_corrected_CIL       0.7649740
auc_optimism_corrected_CIU       0.8737386
accuracy                         0.7622378
accuracy_optimism_corrected      0.7045763
accuracy_optimism_corrected_CIL  0.6121154
accuracy_optimism_corrected_CIU  0.7918367
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               19.0000000
auc                              0.8088509
auc_optimism_corrected           0.7182619
auc_optimism_corrected_CIL       0.6604457
auc_optimism_corrected_CIU       0.7917652
accuracy                         0.7403846
accuracy_optimism_corrected      0.7189495
accuracy_optimism_corrected_CIL  0.6446429
accuracy_optimism_corrected_CIU  0.7566076
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               21.0000000
auc                              0.8874330
auc_optimism_corrected           0.8009876
auc_optimism_corrected_CIL       0.7335368
auc_optimism_corrected_CIU       0.8864556
accuracy                         0.8151659
accuracy_optimism_corrected      0.7465345
accuracy_optimism_corrected_CIL  0.6166738
accuracy_optimism_corrected_CIU  0.8089339
roc_c

2.5.2.2.3 RF
2.5.2.2.3.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "71"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9749018"
auc_optimism_corrected_CIL      "0.9501443"
auc_optimism_corrected_CIU      "0.9929425"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8954299"
accuracy_optimism_corrected_CIL "0.8438571"
accuracy_optimism_corrected_CIU "0.930791" 
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "31"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.901005" 
auc_optimism_corrected_CIL      "0.8504941"
auc_optimism_corrected_CIU      "0.9534551"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8441077"
accuracy_optimism_corrected_CIL "0.785"    
accuracy_optimism_corrected_CIU "0.8924805"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "17"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9248253"
auc_optimism_corrected_CIL      "0.8775136"
auc_optimism_corrected_CIU      "0.9641348"
accuracy                        "1"        
accuracy_optimism_corrected     "0.875495" 
accuracy_optimism_corrected_CIL "0.8132086"
accuracy_optimism_corrected_CIU "0.9172763"
roc_c

2.5.2.2.3.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "17"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9665056"
auc_optimism_corrected_CIL      "0.9449662"
auc_optimism_corrected_CIU      "0.9863349"
accuracy                        "1"        
accuracy_optimism_corrected     "0.902696" 
accuracy_optimism_corrected_CIL "0.8441384"
accuracy_optimism_corrected_CIU "0.9399633"
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "85"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.8604571"
auc_optimism_corrected_CIL      "0.824759" 
auc_optimism_corrected_CIU      "0.9035517"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8070124"
accuracy_optimism_corrected_CIL "0.758049" 
accuracy_optimism_corrected_CIU "0.8571429"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "31"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9158303"
auc_optimism_corrected_CIL      "0.8472241"
auc_optimism_corrected_CIU      "0.9820129"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8678365"
accuracy_optimism_corrected_CIL "0.8040909"
accuracy_optimism_corrected_CIU "0.9498863"
roc_c

2.5.2.2.4 Gradient boosting
2.5.2.2.4.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9694065
auc_optimism_corrected_CIL        0.9465650
auc_optimism_corrected_CIU        0.9947741
accuracy                          1.0000000
accuracy_optimism_corrected       0.9138621
accuracy_optimism_corrected_CIL   0.8255102
accuracy_optimism_corrected_CIU   0.9639881
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.8818766
auc_optimism_corrected_CIL        0.8250024
auc_optimism_corrected_CIU        0.9331889
accuracy                          1.0000000
accuracy_optimism_corrected       0.8505813
accuracy_optimism_corrected_CIL   0.7753125
accuracy_optimism_corrected_CIU   0.9131829
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9152636
auc_optimism_corrected_CIL        0.8490806
auc_optimism_corrected_CIU        0.9591980
accuracy                          1.0000000
accuracy_optimism_corrected       0.8440457
accuracy_optimism_corrected_CIL   0.7779221
accuracy_optimism_corrected_CIU   0.9026667
roc_c

2.5.2.2.4.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9704068
auc_optimism_corrected_CIL        0.9420501
auc_optimism_corrected_CIU        0.9912599
accuracy                          1.0000000
accuracy_optimism_corrected       0.9084848
accuracy_optimism_corrected_CIL   0.8403249
accuracy_optimism_corrected_CIU   0.9575208
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.8347726
auc_optimism_corrected_CIL        0.7841428
auc_optimism_corrected_CIU        0.8892752
accuracy                          1.0000000
accuracy_optimism_corrected       0.8039108
accuracy_optimism_corrected_CIL   0.7626480
accuracy_optimism_corrected_CIU   0.8542208
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9470524
auc_optimism_corrected_CIL        0.9072132
auc_optimism_corrected_CIU        0.9829447
accuracy                          1.0000000
accuracy_optimism_corrected       0.8996262
accuracy_optimism_corrected_CIL   0.8474900
accuracy_optimism_corrected_CIU   0.9373579
roc_c

2.5.2.3 Saving results

models_list <- list()

for (model_name in names(supplements_models$models_summ)){
  df <- do.call(rbind, supplements_models$models_summ[[model_name]])
  models_list[[model_name]] <- df
}

write.xlsx(models_list,
           file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
           rowNames=TRUE)

2.6 Results overview

2.6.0.1 Alpha diversity

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")
Results of linear model testing ASV Richness
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -53.642 11.742 -4.568 0.000 0.000 ***
healthy , pre_ltx - CZ vs NO 11.219 10.488 1.070 0.287 0.361
healthy vs Grouppre_ltx:CountryNO 6.313 15.403 0.410 0.683 0.683
pre_ltx vs Grouppost_ltx 25.623 11.366 2.254 0.025 0.057
pre_ltx , post_ltx - CZ vs NO 17.533 12.615 1.390 0.166 0.249
pre_ltx vs Grouppost_ltx:CountryNO -35.933 15.921 -2.257 0.025 0.057
healthy vs Grouppost_ltx -28.019 9.237 -3.033 0.003 0.012 *
healthy , post_ltx - CZ vs NO 11.219 11.267 0.996 0.321 0.361
healthy vs Grouppost_ltx:CountryNO -29.620 14.629 -2.025 0.044 0.080
knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")
Results of linear model testing Shannon index
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.505 0.159 -3.171 0.002 0.017 *
healthy , pre_ltx - CZ vs NO 0.007 0.142 0.046 0.963 0.963
healthy vs Grouppre_ltx:CountryNO 0.144 0.209 0.687 0.493 0.634
pre_ltx vs Grouppost_ltx 0.372 0.148 2.514 0.013 0.049 *
pre_ltx , post_ltx - CZ vs NO 0.150 0.164 0.915 0.361 0.542
pre_ltx vs Grouppost_ltx:CountryNO -0.502 0.207 -2.423 0.016 0.049 *
healthy vs Grouppost_ltx -0.134 0.111 -1.200 0.231 0.416
healthy , post_ltx - CZ vs NO 0.007 0.136 0.048 0.962 0.963
healthy vs Grouppost_ltx:CountryNO -0.358 0.176 -2.031 0.044 0.098
knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")
Results of linear model testing Simpson index
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.046 0.019 -2.481 0.014 0.129
healthy , pre_ltx - CZ vs NO -0.008 0.017 -0.466 0.642 0.692
healthy vs Grouppre_ltx:CountryNO 0.027 0.024 1.092 0.277 0.498
pre_ltx vs Grouppost_ltx 0.037 0.021 1.763 0.079 0.238
pre_ltx , post_ltx - CZ vs NO 0.019 0.023 0.811 0.418 0.627
pre_ltx vs Grouppost_ltx:CountryNO -0.060 0.029 -2.033 0.043 0.195
healthy vs Grouppost_ltx -0.009 0.016 -0.572 0.568 0.692
healthy , post_ltx - CZ vs NO -0.008 0.020 -0.396 0.692 0.692
healthy vs Grouppost_ltx:CountryNO -0.033 0.025 -1.303 0.194 0.436
knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")
Results of linear model testing Pielou index
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.040 0.024 -1.708 0.090 0.285
healthy , pre_ltx - CZ vs NO -0.008 0.021 -0.394 0.694 0.943
healthy vs Grouppre_ltx:CountryNO 0.010 0.031 0.332 0.740 0.943
pre_ltx vs Grouppost_ltx 0.042 0.022 1.922 0.056 0.285
pre_ltx , post_ltx - CZ vs NO 0.002 0.024 0.082 0.935 0.943
pre_ltx vs Grouppost_ltx:CountryNO -0.051 0.030 -1.677 0.095 0.285
healthy vs Grouppost_ltx 0.001 0.017 0.072 0.943 0.943
healthy , post_ltx - CZ vs NO -0.008 0.020 -0.407 0.684 0.943
healthy vs Grouppost_ltx:CountryNO -0.041 0.027 -1.525 0.129 0.290

Plots

alpha_div_plots[[paste(segment,"Country")]]

2.6.0.2 Beta diversity

Main analysis

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")
Results of PERMANOVA - robust aitchison distance
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 695.306 4.259 0.029 0.001 0.002 **
pre_ltx vs post_ltx 1 309.768 1.811 0.009 0.003 0.003 **
post_ltx vs healthy 1 914.409 5.454 0.025 0.001 0.002 **
pre_ltx vs healthy , Country 1 482.074 2.953 0.020 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 699.075 4.087 0.019 0.001 0.001 ***
post_ltx vs healthy , Country 1 728.020 4.342 0.020 0.001 0.001 ***
pre_ltx vs healthy : Country 1 237.598 1.460 0.010 0.010 0.030 *
pre_ltx vs post_ltx : Country 1 237.265 1.390 0.007 0.021 0.032 *
post_ltx vs healthy : Country 1 202.888 1.211 0.006 0.106 0.106

PCA

pca_plots_list[[paste(segment,"genus custom")]]

Supplementary analysis

knitr::kable(supplements_beta[!grepl("PCoA",names(supplements_beta)) & (grepl("genus",names(supplements_beta)))],
             digits = 3,
             caption = "Supplementary PERMANOVA results: Bray-curtis, Jaccard distances")
Supplementary PERMANOVA results: Bray-curtis, Jaccard distances
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 0.963 5.058 0.033 0.001 0.002 **
pre_ltx vs post_ltx 1 0.342 1.581 0.007 0.070 0.070
post_ltx vs healthy 1 1.551 7.718 0.034 0.001 0.002 **
pre_ltx vs healthy , Country 1 1.132 5.948 0.039 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1.991 9.200 0.042 0.001 0.001 ***
post_ltx vs healthy , Country 1 1.656 8.242 0.037 0.001 0.001 ***
pre_ltx vs healthy : Country 1 0.449 2.380 0.016 0.002 0.006 **
pre_ltx vs post_ltx : Country 1 0.330 1.527 0.007 0.075 0.075
post_ltx vs healthy : Country 1 0.376 1.878 0.008 0.013 0.020 *
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1.087 3.929 0.026 0.001 0.002 **
pre_ltx vs post_ltx 1 0.439 1.457 0.007 0.045 0.045 *
post_ltx vs healthy 1 1.633 5.675 0.026 0.001 0.002 **
pre_ltx vs healthy , Country 1 1.232 4.452 0.030 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1.965 6.513 0.030 0.001 0.001 ***
post_ltx vs healthy , Country 1 1.751 6.085 0.028 0.001 0.001 ***
pre_ltx vs healthy : Country 1 0.525 1.912 0.013 0.002 0.006 **
pre_ltx vs post_ltx : Country 1 0.405 1.346 0.006 0.072 0.072
post_ltx vs healthy : Country 1 0.450 1.569 0.007 0.017 0.026 *

PCA

ggarrange(plotlist = supplements_beta[grepl("PCoA",names(supplements_beta))],
          labels=names(supplements_beta[grepl("PCoA",names(supplements_beta))]),
          font.label = list(size=5,face="plain"),
          ncol=2,nrow=3)

2.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(psc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect ASV","PSC effect Genus","PSC effect Phylum")),caption="Number of significant taxa")
Number of significant taxa
Count
terminal_ileum genus healthy vs pre_ltx 25
terminal_ileum genus pre_ltx vs post_ltx 4
terminal_ileum genus healthy vs post_ltx 38
terminal_ileum ASV healthy vs pre_ltx 42
terminal_ileum ASV pre_ltx vs post_ltx 2
terminal_ileum ASV healthy vs post_ltx 65
terminal_ileum phylum healthy vs pre_ltx 1
terminal_ileum phylum pre_ltx vs post_ltx 0
terminal_ileum phylum healthy vs post_ltx 3
PSC effect ASV 15
PSC effect Genus 25
PSC effect Phylum 0

2.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=2,caption="Elastic net results")
Elastic net results
alpha lambda auc_optimism_corrected auc_optimism_corrected_CIL auc_optimism_corrected_CIU
pre_ltx vs healthy ASV terminal_ileum 0.4 0.03 0.96 0.89 1.00
pre_ltx vs post_ltx ASV terminal_ileum 0.2 0.13 0.86 0.79 0.93
post_ltx vs healthy ASV terminal_ileum 0.2 0.05 0.96 0.90 0.99
pre_ltx vs healthy genus terminal_ileum 0.2 0.04 0.97 0.92 1.00
pre_ltx vs post_ltx genus terminal_ileum 0.0 0.67 0.83 0.75 0.92
post_ltx vs healthy genus terminal_ileum 0.0 0.07 0.97 0.92 0.99

ROC - Genus level

roc_curve_all_custom(roc_cs[c(4:6)],Q="Q1",
                     model_name="enet_model")

Supplementary models

# Build final dataframe
models_list[["enet_model"]] <- models_summ_df_ileum
final_df <- tibble(row_names = rownames(models_list[[1]]))

# Loop through models and extract required values
for (model_name in names(models_list)) {
  model_df <- models_list[[model_name]]
  
  # Combine AUC_optimism_corrected with its CI values
  final_df[[model_name]] <- paste0(
    round(model_df$auc_optimism_corrected, 3), 
    " (", round(model_df$auc_optimism_corrected_CIL, 3), "; ", 
    round(model_df$auc_optimism_corrected_CIU, 3), ")"
  )
}

knitr::kable(final_df, caption="All models")
All models
row_names knn_model rf_model gbm_model enet_model_ra knn_model_ra rf_model_ra gbm_model_ra enet_model
pre_ltx vs healthy ASV terminal_ileum 0.871 (0.761; 0.954) 0.947 (0.873; 0.995) 0.955 (0.902; 0.992) 0.924 (0.837; 0.972) 0.829 (0.699; 0.932) 0.975 (0.95; 0.993) 0.969 (0.947; 0.995) 0.956 (0.894; 0.996)
pre_ltx vs post_ltx ASV terminal_ileum 0.686 (0.555; 0.794) 0.856 (0.756; 0.941) 0.868 (0.787; 0.935) 0.734 (0.617; 0.834) 0.666 (0.583; 0.736) 0.901 (0.85; 0.953) 0.882 (0.825; 0.933) 0.865 (0.785; 0.932)
post_ltx vs healthy ASV terminal_ileum 0.87 (0.78; 0.953) 0.9 (0.817; 0.971) 0.925 (0.852; 0.982) 0.888 (0.818; 0.94) 0.824 (0.743; 0.909) 0.925 (0.878; 0.964) 0.915 (0.849; 0.959) 0.958 (0.9; 0.99)
pre_ltx vs healthy genus terminal_ileum 0.887 (0.8; 0.96) 0.95 (0.877; 0.996) 0.973 (0.928; 1) 0.939 (0.896; 0.979) 0.819 (0.765; 0.874) 0.967 (0.945; 0.986) 0.97 (0.942; 0.991) 0.966 (0.916; 0.996)
pre_ltx vs post_ltx genus terminal_ileum 0.727 (0.608; 0.833) 0.824 (0.719; 0.906) 0.83 (0.74; 0.912) 0.737 (0.654; 0.812) 0.718 (0.66; 0.792) 0.86 (0.825; 0.904) 0.835 (0.784; 0.889) 0.834 (0.749; 0.915)
post_ltx vs healthy genus terminal_ileum 0.921 (0.857; 0.974) 0.912 (0.821; 0.979) 0.937 (0.868; 0.983) 0.857 (0.801; 0.894) 0.801 (0.734; 0.886) 0.916 (0.847; 0.982) 0.947 (0.907; 0.983) 0.966 (0.922; 0.994)

ROC - genus

rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs
plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
                       Q="Q1",
                       model_name=model_name)
}

p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=5))
p

3 Data Analysis - Colon

segment="colon"

3.1 Filtering

Rules: - sequencing depth > 10000 - nearZeroVar() with default settings

Rarefaction Curve

path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))
load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
        theme_bw()+
        theme(axis.text=element_text(size=8), 
              panel.grid=element_blank(),
              strip.background = element_rect(colour=NA,fill="grey"),
              strip.text.x = element_text(face="bold")) + 
        geom_vline(xintercept = seq_depth_threshold, 
                   linetype="dashed", color = "red") + 
        xlim(0, 20000)
The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prare

Library size

read_counts(colon_asv_tab, line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.1 Sequencing depth

data_filt <- seq_depth_filtering(colon_asv_tab,
                                 colon_taxa_tab,
                                 colon_metadata,
                                 seq_depth_threshold = 10000)
Removing 75 ASV(s)
filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata

seq_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
                                   filt_colon_taxa_tab,
                                   filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Check zero depth

data_filt <- check_zero_depth(filt_colon_asv_tab, 
                              filt_colon_taxa_tab, 
                              filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]; 
filt_colon_taxa_tab <- data_filt[[2]]; 
filt_colon_metadata <- data_filt[[3]]; 

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.3 Final Counts

final_counts_filtering(colon_asv_tab,
                       filt_colon_asv_tab,
                       filt_colon_metadata,
                       seq_step, 0, nearzero_step)
                                               V1
Raw data: ASVs                               6936
Raw data: Samples                             789
Sequencing depth filt: ASVs                  6861
Prevalence filt: ASVs                           0
NearZeroVar filt: ASVs                        381
Filt data: ASVs                               381
Filt data: Samples                            760
Filt data: Patients                           354
Filt data: Patients.1                           0
Filtered samples                               29
Matrices                    Cecum;Rectum;CD;CA;SI
healthy                                       161
non-rPSC                                        0
rPSC                                            0
pre_ltx                                       250
post_ltx                                      349
ETOH                                            0

3.2 Alpha diversity

Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear mixed-effect model.

path = "../results/Q1/alpha_diversity"

Calculation

# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
                                         alpha_colon_taxa_tab,
                                         alpha_colon_metadata))

colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
                         Observe=colon_mpse$Observe,
                         Shannon=colon_mpse$Shannon,
                         Simpson=colon_mpse$Simpson,
                         Pielou=colon_mpse$Pielou,
                         Group=colon_mpse$Group,
                         Country=colon_mpse$Country,
                         Patient=colon_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

3.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

3.2.2 Linear Model

path = "../results/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lmer(
  formula = "Observe ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_detailed <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_detailed <- NA
}

# save the results
pc_observed[[segment]] <- results_model_observe
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")
Raw results of linear model of richness estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -56.230 11.738 195.570 -4.791 0.000 0.000 ***
healthy vs pre_ltx - CZ vs NO 14.193 10.341 194.538 1.372 0.171 0.256
healthy vs Grouppre_ltx:CountryNO -7.489 15.169 194.358 -0.494 0.622 0.622
pre_ltx vs Grouppost_ltx 28.886 11.811 267.502 2.446 0.015 0.045 *
pre_ltx vs post_ltx - CZ vs NO 6.742 12.174 265.784 0.554 0.580 0.622
pre_ltx vs Grouppost_ltx:CountryNO -24.047 15.112 262.828 -1.591 0.113 0.203
healthy vs Grouppost_ltx -27.299 8.699 250.458 -3.138 0.002 0.009 **
healthy vs post_ltx - CZ vs NO 14.360 11.153 251.729 1.288 0.199 0.256
healthy vs Grouppost_ltx:CountryNO -31.681 14.195 247.531 -2.232 0.027 0.060
knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 171.780 7.001 162.548 24.535 0.000 NA
Grouppost_ltx -27.276 8.558 161.345 -3.187 0.002 0.003
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 186.110 8.829 88.980 21.080 0 NA
Grouppost_ltx -58.943 11.539 87.242 -5.108 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 171.793 6.342 89.560 27.090 0.000 NA
CountryNO 14.358 9.941 89.408 1.444 0.152 0.152
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 144.490 5.283 158.786 27.352 0.000 NA
CountryNO -17.319 9.272 154.970 -1.868 0.064 0.085

Shannon

results_model <- pairwise.lmer(
  formula = "Shannon ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_detailed <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_detailed <- NA
}

# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")
Raw results of linear model of Shannon estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.585 0.143 197.588 -4.084 0.000 0.001 ***
healthy vs pre_ltx - CZ vs NO 0.015 0.126 196.213 0.116 0.907 0.909
healthy vs Grouppre_ltx:CountryNO 0.081 0.185 195.965 0.437 0.663 0.852
pre_ltx vs Grouppost_ltx 0.364 0.158 265.003 2.307 0.022 0.056
pre_ltx vs post_ltx - CZ vs NO 0.095 0.163 263.448 0.585 0.559 0.839
pre_ltx vs Grouppost_ltx:CountryNO -0.455 0.202 260.765 -2.253 0.025 0.056
healthy vs Grouppost_ltx -0.220 0.102 247.586 -2.153 0.032 0.058
healthy vs post_ltx - CZ vs NO 0.015 0.131 248.759 0.114 0.909 0.909
healthy vs Grouppost_ltx:CountryNO -0.376 0.167 244.904 -2.255 0.025 0.056
knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_CZ 3.102 0.144 131.105 21.535 0.000 NA
Grouppost_ltx 0.363 0.160 130.742 2.275 0.025 0.049
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_NO 3.197 0.078 129.531 41.004 0.000 NA
Grouppost_ltx -0.091 0.125 129.050 -0.732 0.466 0.563
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_CZ 3.101 0.143 105.863 21.629 0.000 NA
CountryNO 0.095 0.164 104.934 0.580 0.563 0.563
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 3.466 0.068 156.394 51.151 0.000 NA
CountryNO -0.360 0.119 153.436 -3.025 0.003 0.012
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 3.686 0.082 159.084 44.805 0.00 NA
Grouppost_ltx -0.220 0.101 158.215 -2.190 0.03 0.04
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 3.701 0.104 90.164 35.654 0 NA
Grouppost_ltx -0.596 0.136 88.069 -4.396 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 3.685 0.054 89.090 68.738 0.000 NA
CountryNO 0.017 0.084 88.849 0.205 0.838 0.838
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 3.466 0.068 156.394 51.151 0.000 NA
CountryNO -0.360 0.119 153.436 -3.025 0.003 0.006

Simpson

results_model <- pairwise.lmer(
  formula = "Simpson ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_detailed <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_detailed <- NA
}

# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")
Raw results of linear model of Simpson estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.059 0.019 199.853 -3.066 0.002 0.022 *
healthy vs pre_ltx - CZ vs NO -0.007 0.017 198.127 -0.430 0.667 0.738
healthy vs Grouppre_ltx:CountryNO 0.020 0.025 197.803 0.813 0.417 0.626
pre_ltx vs Grouppost_ltx 0.031 0.026 263.530 1.213 0.226 0.455
pre_ltx vs post_ltx - CZ vs NO 0.013 0.027 261.964 0.476 0.635 0.738
pre_ltx vs Grouppost_ltx:CountryNO -0.052 0.033 259.262 -1.556 0.121 0.363
healthy vs Grouppost_ltx -0.027 0.017 246.082 -1.594 0.112 0.363
healthy vs post_ltx - CZ vs NO -0.007 0.022 247.135 -0.335 0.738 0.738
healthy vs Grouppost_ltx:CountryNO -0.032 0.028 243.697 -1.147 0.253 0.455
knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Table: Raw results of independent country analysis

Pielou

results_model <- pairwise.lmer(
  formula = "Pielou ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_detailed <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_detailed <- NA
}

# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")
Raw results of linear model of Pielou estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.057 0.020 201.130 -2.828 0.005 0.046 *
healthy vs pre_ltx - CZ vs NO -0.008 0.018 199.017 -0.446 0.656 0.781
healthy vs Grouppre_ltx:CountryNO 0.015 0.026 198.602 0.565 0.573 0.781
pre_ltx vs Grouppost_ltx 0.039 0.024 265.797 1.631 0.104 0.234
pre_ltx vs post_ltx - CZ vs NO 0.007 0.025 263.735 0.274 0.785 0.785
pre_ltx vs Grouppost_ltx:CountryNO -0.062 0.031 260.213 -2.015 0.045 0.202
healthy vs Grouppost_ltx -0.018 0.016 247.076 -1.130 0.260 0.467
healthy vs post_ltx - CZ vs NO -0.008 0.020 248.451 -0.394 0.694 0.781
healthy vs Grouppost_ltx:CountryNO -0.047 0.026 243.881 -1.828 0.069 0.206
knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_CZ 0.664 0.022 130.842 29.916 0.000 NA
Grouppost_ltx 0.039 0.025 130.452 1.582 0.116 0.232
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_NO 0.670 0.012 129.502 57.528 0.000 NA
Grouppost_ltx -0.023 0.019 128.819 -1.219 0.225 0.3
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_CZ 0.664 0.020 107.760 32.578 0.000 NA
CountryNO 0.007 0.023 106.251 0.291 0.772 0.772
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 0.703 0.011 155.706 65.557 0.000 NA
CountryNO -0.055 0.019 152.292 -2.919 0.004 0.016
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 0.721 0.013 158.287 55.872 0.000 NA
Grouppost_ltx -0.018 0.016 157.427 -1.146 0.254 0.338
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 0.713 0.016 92.108 44.931 0.000 NA
Grouppost_ltx -0.065 0.021 89.257 -3.151 0.002 0.008
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 0.720 0.007 89.312 100.415 0.000 NA
CountryNO -0.008 0.011 88.968 -0.675 0.502 0.502
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 0.703 0.011 155.706 65.557 0.000 NA
CountryNO -0.055 0.019 152.292 -2.919 0.004 0.008

3.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

3.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.

3.3.1 Main analysis

Genus level, Aitchison distance

level="genus"
path = "../results/Q1/beta_diversity"

Aggregation, filtering

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)

filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             colon_metadata,
                            seq_depth_threshold=10000)
Removing 10 ASV(s)
filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]

3.3.2 PERMANOVA

pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          interaction = TRUE, 
                          patients = filt_colon_genus_metadata$Patient,
                          sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1222.921 7.950 0.019 0.001 0.002 **
pre_ltx vs post_ltx 1 630.146 4.062 0.007 0.008 0.008 **
post_ltx vs healthy 1 1736.169 11.014 0.021 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 738.576 4.801 0.011 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1532.675 9.879 0.016 0.001 0.001 ***
post_ltx vs healthy , Country 1 1728.746 10.966 0.021 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 318.719 2.077 0.005 0.325 0.488
pre_ltx vs post_ltx : Country 1 322.405 2.082 0.003 0.524 0.524
post_ltx vs healthy : Country 1 398.820 2.538 0.005 0.112 0.336

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_genus_metadata$Patient)
  print(result_list)
} 
}
3.3.2.0.1 Plots
p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=2, 
                                 show_legend= FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

3.3.2.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

3.3.3 Supplementary analysis

3.3.3.1 Genus level

level="genus"
3.3.3.1.1 Bray-Curtis

PERMANOVA

pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          patients = filt_colon_genus_metadata$Patient,
                          interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 2.119 10.732 0.024 0.001 0.001 ***
pre_ltx vs post_ltx 1 1.307 6.239 0.010 0.001 0.001 ***
post_ltx vs healthy 1 3.286 17.150 0.031 0.001 0.001 ***
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 2.862 14.497 0.033 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 6.618 31.603 0.049 0.001 0.001 ***
post_ltx vs healthy , Country 1 5.496 28.682 0.052 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 0.783 3.993 0.009 0.012 0.018 *
pre_ltx vs post_ltx : Country 1 0.562 2.691 0.004 0.112 0.112
post_ltx vs healthy : Country 1 0.998 5.252 0.009 0.002 0.006 **

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      patients = filt_colon_genus_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs     R2      F Pr(>F) p.adjusted
Fac        1    0.680 0.0271 3.7602   0.05       0.05
Residual 135   24.413 0.9729                         
Total    136   25.093 1.0000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    2.222 0.03858 10.915  0.001  0.0013333
Residual 272   55.360 0.96142                         
Total    273   57.582 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs     R2      F Pr(>F) p.adjusted
Cov        1    2.166 0.0386 9.9564  0.001  0.0013333
Residual 248   53.948 0.9614                         
Total    249   56.114 1.0000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1   1.4791 0.05417 9.1064  0.001  0.0013333
Residual 159  25.8258 0.94583                         
Total    160  27.3049 1.00000                         

$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    1.302 0.02297 7.0989  0.001      0.001
Residual 302   55.371 0.97703                         
Total    303   56.673 1.00000                         

$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    2.982 0.06786 14.924  0.001      0.001
Residual 205   40.966 0.93214                         
Total    206   43.949 1.00000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    5.014 0.06639 24.748  0.001      0.001
Residual 348   70.512 0.93361                         
Total    349   75.526 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1   1.4791 0.05417 9.1064  0.001      0.001
Residual 159  25.8258 0.94583                         
Total    160  27.3049 1.00000                         

Plots

p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 measure = "bray",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p

3.3.3.1.2 Jaccard

PERMANOVA

pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country,
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country,
                          patients = filt_colon_genus_metadata$Patient,
                          interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 2.225 7.844 0.018 0.001 0.002 **
pre_ltx vs post_ltx 1 1.382 4.680 0.007 0.002 0.002 **
post_ltx vs healthy 1 3.371 12.044 0.022 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 3.013 10.622 0.025 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 6.474 21.918 0.035 0.001 0.001 ***
post_ltx vs healthy , Country 1 5.595 19.994 0.037 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 0.939 3.330 0.008 0.010 0.015 *
pre_ltx vs post_ltx : Country 1 0.696 2.362 0.004 0.163 0.163
post_ltx vs healthy : Country 1 1.134 4.078 0.008 0.001 0.003 **

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      patients = filt_colon_genus_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    0.759 0.02074 2.8587  0.061      0.061
Residual 135   35.842 0.97926                         
Total    136   36.601 1.00000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    2.405 0.02957 8.2869  0.001  0.0013333
Residual 272   78.937 0.97043                         
Total    273   81.342 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    2.196 0.02856 7.2916  0.001  0.0013333
Residual 248   74.696 0.97144                         
Total    249   76.892 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.756 0.04196 6.9641  0.001  0.0013333
Residual 159   40.083 0.95804                         
Total    160   41.839 1.00000                         

$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    1.456 0.01743 5.3572  0.001      0.001
Residual 302   82.098 0.98257                         
Total    303   83.554 1.00000                         

$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    3.049 0.04918 10.604  0.001      0.001
Residual 205   58.937 0.95082                         
Total    206   61.986 1.00000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    4.974 0.04696 17.147  0.001      0.001
Residual 348  100.952 0.95304                         
Total    349  105.926 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.756 0.04196 6.9641  0.001      0.001
Residual 159   40.083 0.95804                         
Total    160   41.839 1.00000                         

Plots

Custom

p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 measure = "jaccard",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p

3.3.3.2 ASV level

level="ASV"
3.3.3.2.1 Aitchison
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(x=pairwise_df,
                          filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country, 
                           sim.method = "robust.aitchison", 
                           p.adjust.m="BH",
                           patients = filt_colon_metadata$Patient)

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          interaction = TRUE, 
                          sim.method = "robust.aitchison", 
                          p.adjust.m="BH",
                          patients = filt_colon_metadata$Patient)

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)

# see the results
pp_factor
                pairs Df SumsOfSqs   F.Model          R2 p.value p.adjusted sig
1  pre_ltx vs healthy  1  1711.535  7.089160 0.016854638   0.001     0.0015  **
2 pre_ltx vs post_ltx  1   799.824  3.515868 0.005780561   0.023     0.0230   *
3 post_ltx vs healthy  1  2419.630 10.210591 0.019444444   0.001     0.0015  **
pp_cov
                          pairs Df SumsOfSqs  F.Model         R2 p.value
1  pre_ltx vs healthy , Country  1  948.4242 3.928362 0.00933977   0.001
2 pre_ltx vs post_ltx , Country  1 1595.1673 7.012040 0.01152874   0.001
3 post_ltx vs healthy , Country  1 1870.9425 7.895186 0.01503512   0.001
  p.adjusted sig
1      0.001 ***
2      0.001 ***
3      0.001 ***
pp_fac.cov
                          pairs Df SumsOfSqs  F.Model          R2 p.value
1  pre_ltx vs healthy : Country  1  571.9973 2.377204 0.005632842   0.086
2 pre_ltx vs post_ltx : Country  1  439.3379 1.934270 0.003175223   0.768
3 post_ltx vs healthy : Country  1  626.6210 2.652897 0.005035604   0.023
  p.adjusted sig
1      0.129    
2      0.768    
3      0.069    
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1711.535 7.089 0.017 0.001 0.002 **
pre_ltx vs post_ltx 1 799.824 3.516 0.006 0.023 0.023 *
post_ltx vs healthy 1 2419.630 10.211 0.019 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 948.424 3.928 0.009 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1595.167 7.012 0.012 0.001 0.001 ***
post_ltx vs healthy , Country 1 1870.942 7.895 0.015 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 571.997 2.377 0.006 0.086 0.129
pre_ltx vs post_ltx : Country 1 439.338 1.934 0.003 0.768 0.768
post_ltx vs healthy : Country 1 626.621 2.653 0.005 0.023 0.069

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_metadata$Patient)
  print(result_list)
} 
}

PCoA

p <- pca_plot_custom(filt_colon_asv_tab,
                           filt_colon_taxa_tab,
                           filt_colon_metadata,
                           show_boxplots = TRUE,
                           variable = "Group", 
                           size=3, 
                           show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p

# see the results
p

3.3.3.2.2 Bray-Curtis

PERMANOVA

# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country,
                           patients = filt_colon_metadata$Patient,
                           sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          patients = filt_colon_metadata$Patient,
                          interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 3.090 9.636 0.023 0.001 0.001 ***
pre_ltx vs post_ltx 1 1.948 5.972 0.010 0.001 0.001 ***
post_ltx vs healthy 1 4.788 15.423 0.029 0.001 0.001 ***
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 2.383 7.429 0.017 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 5.328 16.338 0.026 0.001 0.001 ***
post_ltx vs healthy , Country 1 4.912 15.825 0.029 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 0.901 2.821 0.007 0.040 0.060
pre_ltx vs post_ltx : Country 1 0.830 2.551 0.004 0.131 0.131
post_ltx vs healthy : Country 1 1.362 4.417 0.008 0.003 0.009 **

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_metadata$Patient,
                      sim.method = 'bray')
  print(result_list)
}
$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    2.098 0.02219 6.8301  0.001      0.001
Residual 301   92.461 0.97781                         
Total    302   94.559 1.00000                         

$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    4.052 0.05993 13.068  0.001      0.001
Residual 205   63.559 0.94007                         
Total    206   67.611 1.00000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    4.574 0.03999 14.456  0.001      0.001
Residual 347  109.796 0.96001                         
Total    348  114.370 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.700 0.03547 5.8478  0.001      0.001
Residual 159   46.225 0.96453                         
Total    160   47.925 1.00000                         

PCoA

p <- pca_plot_custom(filt_colon_asv_tab,
                     filt_colon_taxa_tab,
                     filt_colon_metadata,
                     measure = "bray",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p

3.3.3.2.3 Jaccard

PERMANOVA

# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country,
                           patients = filt_colon_metadata$Patient,
                           sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          patients = filt_colon_metadata$Patient,
                          interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 2.453 6.302 0.015 0.001 0.001 ***
pre_ltx vs post_ltx 1 1.695 4.312 0.007 0.001 0.001 ***
post_ltx vs healthy 1 3.721 9.734 0.018 0.001 0.001 ***
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 2.005 5.151 0.012 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 4.185 10.649 0.017 0.001 0.001 ***
post_ltx vs healthy , Country 1 3.933 10.288 0.020 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 0.906 2.336 0.006 0.086 0.129
pre_ltx vs post_ltx : Country 1 0.862 2.199 0.004 0.291 0.291
post_ltx vs healthy : Country 1 1.291 3.393 0.006 0.004 0.012 *

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      patients = filt_colon_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
$post_ltx_healthy_CZ
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    1.758 0.01512 4.6216  0.001      0.001
Residual 301  114.465 0.98488                         
Total    302  116.223 1.00000                         

$post_ltx_healthy_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    3.254 0.04003 8.5476  0.001      0.001
Residual 205   78.052 0.95997                         
Total    206   81.306 1.00000                         

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1     3.68 0.02673 9.5297  0.001      0.001
Residual 347   134.00 0.97327                         
Total    348   137.68 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs     R2      F Pr(>F) p.adjusted
Cov        1    1.544 0.0257 4.1941  0.001      0.001
Residual 159   58.516 0.9743                         
Total    160   60.059 1.0000                         

PCoA

p <- pca_plot_custom(filt_colon_asv_tab,
                     filt_colon_taxa_tab,
                     filt_colon_metadata,
                     measure = "jaccard",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend= FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p

3.3.3.3 Saving results

write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
           file = file.path(path,
           paste0("supplements_beta_diversity_", segment,".xlsx")))

3.4 Univariate Analysis

Using two DAA tools - linDA and Maaslin2. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.

Main analysis is performed at genus level. In supplementary analysis, it is performed at ASV and phylum levels.

3.4.1 Main analysis

level="genus"
# needed paths
path = "../results/Q1/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q1",level)

# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_venns <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
psc_effect <- list()

3.4.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
                                                  colon_genus_taxa_tab)
3.4.1.1.1 pre_ltx vs healthy
3.4.1.1.1.1 linDA
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 135 ASV(s)
Removing 10 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  411  samples and  148  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")


for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.1.1.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
3.4.1.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 17 ASV(s)
Removing 10 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  600  samples and  130  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.2.3 Group - Intersection
intersection_results <- group_intersection(group, 
                                           list_intersections, 
                                           list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.1.2.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.1.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
3.4.1.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,group,
                            usage="linDA")
Removing 71 ASV(s)
Removing 5 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  511  samples and  153  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.1.3.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab) + xlab("") + ylab("")
min_clr -2.221209 
max_clr 6.886638 
min_log -3.771476 
max_log 5.049706 
dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.09))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))

dot_heatmap_colon <- p
3.4.1.1.5 PSC effect

To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).

pre_LTx vs HC and Post_LTx vs HC intersection

imagename
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]] 
                           SeqID
3                         Rothia
15                   Barnesiella
18                 Butyricimonas
19                   Odoribacter
27                     Alistipes
28               Parabacteroides
39                    Holdemania
57                   Coprococcus
61              Fusicatenibacter
64                    Hungatella
65             Lachnoclostridium
67  Lachnospiraceae_FCS020_group
89              Colidextribacter
91                Intestinimonas
93                 Oscillibacter
109             Negativibacillus
122     Family_XIII_AD3011_group
133                    Dialister
135                  Veillonella
143                 Enterococcus
146                   Klebsiella
148                  Pseudomonas
                                                                                                                          Taxonomy
3                                   k__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Micrococcales;f__Micrococcaceae;g__Rothia
15                                   k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
18                                  k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
19                                    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27                                       k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
28                                k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
39                                 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
57                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
61                                k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
64                                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Hungatella
65                               k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
67                    k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
89                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Colidextribacter
91                                k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas
93                                 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
109                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Negativibacillus
122 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group
133                   k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Dialister
135                 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella
143                                     k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
146                   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
148                     k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
    log2FoldChange      p_value         padj
3         1.842842 1.452081e-05 1.953709e-04
15       -3.590912 7.871469e-05 6.852808e-04
18       -3.002837 3.969499e-06 7.343572e-05
19       -3.406707 3.432387e-07 1.693311e-05
27       -3.110558 1.594988e-05 1.967152e-04
28       -3.568149 2.950689e-06 6.238599e-05
39       -1.174869 7.087889e-03 2.439553e-02
57       -3.357534 5.546126e-06 8.208266e-05
61       -1.917547 1.097200e-02 3.455012e-02
64        2.775798 1.422148e-04 1.107779e-03
65       -1.736481 1.812625e-03 8.383390e-03
67       -2.111851 1.228192e-03 6.268013e-03
89       -1.624469 8.895010e-03 2.991958e-02
91       -1.592037 6.335927e-03 2.404403e-02
93       -2.063990 9.829922e-04 5.538365e-03
109      -1.649295 1.442001e-02 4.268322e-02
122      -1.590837 1.010377e-03 5.538365e-03
133       2.144460 1.055678e-02 3.396528e-02
135       3.864870 1.310501e-07 9.697706e-06
143       5.049706 2.872348e-10 4.251075e-08
146       2.288824 1.471023e-03 7.257045e-03
148       1.318798 5.015789e-03 2.006316e-02

3.4.1.2 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

3.4.2 Supplementary Analysis

3.4.2.1 ASV level

level="ASV"
path_maaslin="../intermediate_files/maaslin/Q1/ASV/"
raw_linda_results[[segment]] <- list()
linda_results[[segment]] <- list()
supplements_wb <- createWorkbook()
3.4.2.1.1 pre_ltx vs healthy
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(colon_asv_tab,
                            colon_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 2138 ASV(s)
Removing 70 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  411  samples and  368  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

Volcano plot

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                                                                      segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                                    filt_colon_uni_data,
                                                    filt_colon_uni_metadata,
                                                    segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.2.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(colon_asv_tab,colon_taxa_tab,
                            colon_metadata,group, usage="linDA")
Removing 1157 ASV(s)
Removing 52 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  599  samples and  249  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment=segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                               segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.2.1.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(colon_asv_tab,
                            colon_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 1096 ASV(s)
Removing 50 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  510  samples and  370  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.2.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,uni_statistics$colon[grepl(level,names(uni_statistics$colon))],colon_taxa_tab)
min_clr -1.270472 
max_clr 5.320228 
min_log -4.431053 
max_log 4.150355 
dotheatmap_linda

3.4.2.1.5 PSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

imagename
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]] 
                                                                                                                                                                                                                                                                                                                                                                                                               SeqID
10  TAGGGAATCTTCGGCAATGGGGGCAACCCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTAAGTCAAGAACGAGTGTGAGAGTGGAAAGTTCACACTGTGACGGTAGCTTACCAGAAAGGGACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTCCCGAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTGATAAGTCTGAAGTTAAAGGCTGTGGCTCAACCATAGTTCGCTTTGGAAACTGTCAAACTTGAGTGCAGAAGGGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCGGTGGCGAAAGCGGCTCTCTGGTCTGTAACTGACGCT
15  TAGGGAATTTTCGGCAATGGGGGAAACCCTGACCGAGCAACGCCGCGTGAAGGAAGAAGTAATTCGTTATGTAAACTTCTGTCATAGAGGAAGAACGGTGGATATAGGGAATGATATCCAAGTGACGGTACTCTATAAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCGAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCACTAAGGGTCTGTGGTGAAAGATCGAAGCTTAACTTCGGTAAGCCATGGAAACCGTAGAGCTAGAGTGTGTGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCGCATAACTGACGCTC
20  TAGGGAATTTTCGTCAATGGGGGAAACCCTGAACGAGCAATGCCGCGTGAGTGAAGAAGGTCTTCGGATCGTAAAGCTCTGTTGTAAGTGAAGAACGGCTCATAGAGGAAATGCTATGGGAGTGACGGTAGCTTACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATCATTGGGCGTAAAGGGTGCGTAGGTGGCGTACTAAGTCTGTAGTAAAAGGCAATGGCTCAACCATTGTAAGCTATGGAAACTGGTATGCTGGAGTGCAGAAGAGGGCGATGGAATTCCATGTGTAGCGGTAAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGTCGCCTGGTCTGTAACTGACACTG
28  TGAGGAATATTGGGCAATGGAGGAAACTCTGACCCAGCCATGCCGCGTGAGTGAAGAAGGTTTTCGAATTGTAAAGCTCTTTCGGATGGGACGATGATGACGGTACCATCTAAAGAAGCCCCGGCAAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGTGTGTAGGCGGATTTGTAAGATAGTGGTGAAATACCTGAGCTCAACTTAGGAATTGCCATTATAACTATAGATCTGGAGTGACAGAGAGGATATTGGAATACCCAGTGTAGAGGTGAAATTCGTAGATATTGGGTAGAACACCAGTGGCGAAGGCGAGTATCTGGCTGTCAACTGACGCTGAGGCACGAAAGCATGGGGATCAAA
48  TGAGGAATATTGGTCAATGGACGAGAGTCTGAACCAGCCAAGTCGCGTGAGGGAAGACTGCCCTATGGGTTGTAAACCTCTTTTATAAGGGAAGAATAAGTTCTACGTGTAGAATGATGCCTGTACCTTATGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTATTAAGTTAGTGGTTAAATATTTGAGCTAAACTCAATTGTGCCATTAATACTGGTAAACTGGAGTACAGACGAGGTAGGCGGAATAAGTTAAGTAGCGGTGAAATGCATAGATATAACTTAGAACTCCGATAGCGAAGGCAGCTTACCAGACTGTAACTGACGCTGATGCA
54  TGAGGAATATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCGTGCAGGATGACGGCTCTATGAGTTGTAAACTGCTTTTGTACGAGGGTAAACGCAGATACGTGTATCTGTCTGAAAGTATCGTACGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTGATAAGTTAGAGGTGAAATTTCGGGGCTCAACCCTGAACGTGCCTCTAATACTGTTGAGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAACTATATCTGACGTTGAGGC
57  TGAGGAATATTGGTCAATGGACGCGAGTCTGAACCAGCCAAGTAGCGTGAAGGATGACTGCCCTATGGGTTGTAAACTTCTTTTATATGGGAATAAAGTTGTCCACGTGTGGATTTTTGTATGTACCATATGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGCAGTCTTGAGTGCAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGAGTGTAACTGACGCTGATGC
97  TGGGGAATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCACGGCAAGCCAGATGTGAAAGCCCGGGGCTCAACCCCGGGACTGCATTTGGAACTGCTGAGCTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
111 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGACGGCACTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTAGAGCTAGAGTGCTGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTGCTGGACAGTAACTGACGTTCAGGCTCGAAAGCGTGGGGAGCAAA
112 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCTGTGCAAGTCTGAAGTGAAAGGCATGGGCTCAACCTGTGGACTGCTTTGGAAACTGTGCAGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
115 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGACGGCGATGCAAGTCTGGAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTATGGCTAGAGTGCTGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTGCTGGACAGTAACTGACGTTCAGGCTCGAAAGCGTGGGGAGCAAA
116 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAAGCAAGTCTGAAGTGAAAACCCAGGGCTCAACCCTGGGACTGCTTTGGAAACTGTTTTGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
117 TGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTAAAGCAAGTCTGAAGTGAAAGCCCGCGGCTCAACTGCGGGACTGCTTTGGAAACTGTTTAACTGGAGTGTCGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGACTTACTGGACGATAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
137 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAAGGATGAAGGCCTTTGGGTCGTAAACTTCTGTTCTAAGGGAAGATAGTGACGGTACCTTAGGAGCAAGTCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGTACGTAGGTGGTTTTCTAAGCACGGGGTTTAAGGCAATGGCTTAACCATTGTTCGCCTTGTGAACTGGAAGACTTGAGTGCAGGAGAGGAAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGACTGTAACTGACACTGAGGTACGAAAGCGTGGGGAGCAAAC
139 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGAAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTTTTGCAAGTCTGAAGTGAAAGCCCGGGGCTTAACCCCGGGACTGCTTTGGAAACTGTAGAACTAGAGTGCAGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
140 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGAAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTTTTGCAAGTCTGAAGTGAAAGCCCGGGGCTTAACCCCGGGACTGCTTTGGAAACTGTAGGACTAGAGTGCAGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
142 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAGCGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGAATGGCTTTGGAAACTGTGCAGCTAGAGTACCGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
145 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAGACAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGCCTTGCTAGAGTGCTGGAGAGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
169 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCATTGCAAGCCAGATGTGAAAGCCCGGGGCTCAACCCCGGGACTGCATTTGGAACTGTAGAGCTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
170 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTATGGCAAGTCTGATGTGAAAGGCCAGGGCTCAACCCTGGGACTGCATTGGAAACTGTCGAACTAGAGTGTCGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
175 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCACCGGCTAAATACGTGCCAGCAGCCGCGGTAATACGTATGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCTGTGTAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTATGCAGCTAGAGTGTCGGAGAGGTAAGTGGAATTCCCAGTGTAGCGGTGAAATGCGTAGATATTGGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
180 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGAGGCAAGTCTGATGTGAAAACCCGGGGCTCAACCCCGTGACTGCATTGGAAACTGTTTTGCTTGAGTGCCGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACGGCAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
187 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGCGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGATAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAAGGCAAGTCTGATGTGAAAACCCAGGGCTTAACCCTGGGACTGCATTGGAAACTGTCTGGCTCGAGTGCCGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGACGGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
210 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGTATGGCAAGTCAGAGGTGAAAACCCAGGGCTTAACCTTGGGATTGCCTTTGAAACTGTCAGACTAGAGTGCAGGAGGGGTAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
237 TGGGGAATATTGCGCAATGGGGGAAACCCTGACGCAGCAACGCCGCGTGATTGAAGAAGGCCTTCGGGTTGTAAAGATCTTTAATCAGGGACGAAACAAATGACGGTACCTGAAGAATAAGCTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGCGCAGGCGGGCCGGTAAGTTGGAAGTGAAATCTATGGGCTTAACCCATAAACTGCTTTCAAAACTGCTGGTCTTGAGTGATGGAGAGGCAGGCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACATTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGC
242 TGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTGTCAGGGAACAGTAGAAGAGGGTACCTGACGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGGCTGGCAAGTCAGGCGTGAAATCCCAGGGCTCAACCCTGGAACTGCGTTTGAAACTGCTGGTCTTGAGTACCGGAGAGGTCATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATAAGGAAGAACACCAGTGGCGAAGGCGGATGACTGGACGGCAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCA
246 TGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTGTCAGGGAAGAGTAGAAGACGGTACCTGACGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGGCCGGCAAGTCAGATGTGAAATCTGGAGGCTTAACCTCCAAACTGCATTTGAAACTGTAGGTCTTGAGTACCGGAGAGGTTATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATAAGGAAGAACACCAGTGGCGAAGGCGGATAACTGGACGGCAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCA
262 TGGGGAATATTGGGCAATGGGCGCAAGCCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTATGAGGGACGAAGGAAGTGACGGTACCTCATGAATAAGCTCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCGAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGGAAGCAAGTCAGATGTGAAAACCAGTGGCTCAACCACTGGCCTGCATTTGAAACTGTTTTTCTTGAGTGATGGAGAGGCAGGCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACATTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGC
264 TGGGGAATATTGGGCAATGGGCGCAAGCCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTCTCAGGGACGAACAAATGACGGTACCTGAGGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGAAGGCAAGTCAGATGTGAAAACTATGGGCTCAACCCATAGCCTGCATTTGAAACTGTTTTTCTTGAGTGCTGGAGAGGCAATCGGAATTCCGTGTGTAGCGGTGAAATGCGTAGATATACGGAGGAACACCAGTGGCGAAGGCGGATTGCTGGACAGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCA
292 TGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGACGGCCTTCGGGTTGTAAAGCTCTGTTAATCGGGACGAAAGGCCTTCTTGCGAATAGTGAGAAGGATTGACGGTACCGGAATAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATAGGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCCAATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCT
293 TGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGACGGCCTTCGGGTTGTAAAGCTCTGTTAATCGGGACGAAAGGCCTTCTTGCGAATAGTTAGAAGGATTGACGGTACCGGAATAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATTGGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCCAATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCT
325 TGGGGGATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAGTGAAGAAGTATTTCGGTATGTAAAGCTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCGACGCAAGTCTGAAGTGAAATACCCGGGCTCAACCTGGGAACTGCTTTGGAAACTGTGTTGCTAGAGTGCTGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
332 TGGGGGATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAGGGAAGACGGCCTTCGGGTTGTAAACCTCTGTCTTCGGGGACGAATAAATGACGGTACCCGAGGAGGAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTACTGGGTGTAAAGGGAGCGTAGGCGGGGAGGCAAGTTGAATGTCTAAACTATCGGCTCAACTGATAGTCGCGTTCAAAACTGCCACTCTTGAGTGCAGTAGAGGTAGGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCCTACTGGGCTGTAACTGACGCTGAGGCTCGAAAGCGTGGGTAGCAA
335 TGGGGGATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGGAGGAAGAAGGTTTTCGGATTGTAAACTCCTGTCGTTAGGGACGATAATGACGGTACCTAACAAGAAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAAAACGTAGGGTGCAAGCGTTGTCCGGAATTACTGGGTGTAAAGGGAGCGCAGGCGGACCGGCAAGTTGGAAGTGAAAACTATGGGCTCAACCCATAAATTGCTTTCAAAACTGCTGGCCTTGAGTAGTGCAGAGGTAGGTGGAATTCCCGGTGTAGCGGTGGAATGCGTAGATATCGGGAGGAACACCAGTGGCGAAGGCGACCTACTGGGCACCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCAAA
346 TAGGGAATCTTCGGCAATGGACGAAAGTCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTTAGAGAAGAACAAGGATGAGAGTAACTGTTCATCCCTTGACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAAGAGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGTAACTGACGCT
357 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAAGGATGAAGTATTTCGGTATGTAAACTTCTATCAGCAGGGAAGAAAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGCAGTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTGCAGCTAGAGTGTCGGAGAGGCAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTGCTGGACGATGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAA
367 TGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGACGGCCTTCGGGTTGTAAAGCTCTGTTAATCGGGACGAATGGTTCTTGTGCAAATAGTGCGAGGATTTGACGGTACCGGAATAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATTGGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCCAATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCT
                                                                                                                                        Taxonomy
10                                    k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__unassigned
15          k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelatoclostridiaceae;g__Erysipelotrichaceae_UCG-003;s__unassigned
20               k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__[Clostridium]_innocuum_group;s__unassigned
28                            k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodospirillales;f__uncultured;g__uncultured;s__unassigned
48                                    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter;s__unassigned
54                                       k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes;s__unassigned
57                                    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__unassigned
97                                           k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
111                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
112                                          k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
115                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
116                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
117                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium;s__unassigned
137 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group;s__unassigned
139                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
140                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
142                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
145                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
169                                          k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
170                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
175                                    k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus;s__unassigned
180                                         k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__CAG-56;s__unassigned
187                               k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter;s__unassigned
210                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__unassigned;s__unassigned
237                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Butyricicoccaceae;g__Butyricicoccus;s__unassigned
242                                k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter;s__unassigned
246                                k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter;s__unassigned
262                                   k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__uncultured;s__unassigned
264                               k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas;s__unassigned
292                 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella;s__unassigned
293                 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella;s__unassigned
325                        k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_UCG-004;s__unassigned
332                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Negativibacillus;s__unassigned
335                               k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Subdoligranulum;s__unassigned
346                                     k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus;s__unassigned
357                                          k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Dorea;s__unassigned
367                 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella;s__unassigned
    log2FoldChange      p_value         padj
10       2.6718947 1.020219e-04 2.208475e-03
15      -3.7840235 1.431558e-05 4.052411e-04
20       1.7161594 2.227124e-03 2.129950e-02
28      -1.8328032 2.682573e-03 2.295784e-02
48      -2.4005935 5.703093e-04 7.773105e-03
54      -4.2728652 1.391165e-06 6.399358e-05
57      -2.7446018 2.373042e-03 2.129950e-02
97      -3.1314152 1.355523e-04 2.625433e-03
111     -4.4310528 8.690369e-07 5.330093e-05
112     -3.9852492 2.068392e-07 1.902921e-05
115     -2.4672999 1.366106e-03 1.675756e-02
116      3.2596602 2.646199e-04 4.426370e-03
117      3.3617848 2.408327e-05 6.330459e-04
137     -1.5682632 1.246219e-04 2.547825e-03
139      1.7316915 7.115666e-03 4.438246e-02
140     -1.9647594 7.516426e-03 4.610074e-02
142     -2.1424529 1.739921e-03 1.829402e-02
145     -4.2349983 1.293743e-06 6.399358e-05
169     -2.1064068 5.031662e-05 1.234435e-03
170     -3.1944423 2.168950e-08 3.990869e-06
175     -4.3755342 7.466843e-08 9.159327e-06
180     -3.1928924 3.580378e-04 5.362121e-03
187     -2.2441970 3.571741e-03 2.738335e-02
210     -1.5342059 2.556985e-03 2.240406e-02
237     -2.1219142 1.656897e-03 1.829402e-02
242     -0.8792198 3.883155e-03 2.916329e-02
246     -2.8737528 7.715727e-06 2.581261e-04
262     -2.0515366 3.567012e-03 2.738335e-02
264     -2.4237638 3.897547e-06 1.434297e-04
292      3.0513101 3.648769e-06 1.434297e-04
293      1.7601475 6.362803e-03 4.037089e-02
325     -2.2782634 1.658920e-03 1.829402e-02
332     -1.8536497 9.239523e-04 1.214337e-02
335     -3.4581653 4.652641e-04 6.585276e-03
346      3.6455563 4.788864e-07 3.524604e-05
357      1.4450185 4.835811e-03 3.469320e-02
367      1.7110108 3.642745e-04 5.362121e-03

3.4.2.2 Phylum level

level="phylum"
path_maaslin="../intermediate_files/maaslin/Q1/Phylum/"
raw_linda_results_phylum[[segment]] <- list()
linda_results_phylum[[segment]] <- list()

Aggregate taxa

phylum_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = "Phylum")

colon_phylum_tab <- phylum_data[[1]]
colon_phylum_taxa_tab <- phylum_data[[2]]
3.4.2.2.1 pre_ltx vs healthy
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 2 ASV(s)
Removing 1 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  411  samples and  8  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction effect")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

Volcano plot

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                                                                      segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                                    filt_colon_uni_data,
                                                    filt_colon_uni_metadata,
                                                    segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.2.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,group, usage="linDA")
Removing 1 ASV(s)
Removing 1 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  600  samples and  8  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction effect")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment=segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                               segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.2.2.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA

# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 2 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  511  samples and  8  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction effect")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.2.2.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,uni_statistics$colon[grepl(level,names(uni_statistics$colon))],colon_taxa_tab)
min_clr -4.297728 
max_clr 5.183865 
min_log -2.728687 
max_log 2.86356 
dotheatmap_linda

3.4.2.2.5 PSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

imagename
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]] 
[1] SeqID          Taxonomy       log2FoldChange p_value        padj          
<0 rows> (or 0-length row.names)

3.4.2.3 Saving results

# ALL DATA
saveWorkbook(supplements_wb,file.path(path,paste0("supplements_uni_analysis_wb_",segment,".xlsx")),overwrite = TRUE)

# PSC effect
write.xlsx(psc_effect,
          file.path(path,paste0("supplements_psc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("supplements_significant_taxa_",segment,".xlsx")))

3.5 Machine learning

Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.

The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.

path = "../results/Q1/models"

3.5.1 ENET

model="enet"

3.5.1.1 ASV level

level="ASV"
3.5.1.1.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.200000000
lambda                          0.007285712
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.944482930
auc_optimism_corrected_CIL      0.879483337
auc_optimism_corrected_CIU      0.986503241
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.872590475
accuracy_optimism_corrected_CIL 0.773863933
accuracy_optimism_corrected_CIU 0.937692308
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0 250

$czech
    Predicted
True  0  1
   0 95  0
   1  0 42

$no
    Predicted
True   0   1
   0  66   0
   1   0 208
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           1.000000000
lambda                          0.004424186
auc                             0.999667622
auc_czech                       0.999656593
auc_no                          0.999522640
auc_optimism_corrected          0.788320408
auc_optimism_corrected_CIL      0.715647773
auc_optimism_corrected_CIU      0.842743983
accuracy                        0.993322204
accuracy_czech                          NaN
accuracy_no                     0.994269341
accuracy_optimism_corrected     0.725877580
accuracy_optimism_corrected_CIL 0.657080387
accuracy_optimism_corrected_CIU 0.769266055
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 347   2
   1   2 248

$czech
    Predicted
True   0   1
   0 207   1
   1   1  41

$no
    Predicted
True   0   1
   0 140   1
   1   1 207
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.1.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.40000000
lambda                          0.00203662
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96930614
auc_optimism_corrected_CIL      0.93705851
auc_optimism_corrected_CIU      0.98669590
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.91353052
accuracy_optimism_corrected_CIL 0.86451155
accuracy_optimism_corrected_CIU 0.95141794
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0 349

$czech
    Predicted
True   0   1
   0  95   0
   1   0 208

$no
    Predicted
True   0   1
   0  66   0
   1   0 141
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.2 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
3.5.1.2.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.04262835
auc                             0.99955280
auc_czech                       0.99974937
auc_no                          0.99927156
auc_optimism_corrected          0.93420491
auc_optimism_corrected_CIL      0.88169929
auc_optimism_corrected_CIU      0.97185321
accuracy                        0.98783455
accuracy_czech                         NaN
accuracy_no                     0.98905109
accuracy_optimism_corrected     0.85834482
accuracy_optimism_corrected_CIL 0.79082492
accuracy_optimism_corrected_CIU 0.92284119
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 159   2
   1   3 247

$czech
    Predicted
True  0  1
   0 94  1
   1  1 41

$no
    Predicted
True   0   1
   0  65   1
   1   2 206
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.1.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.05181695
auc                             0.96996571
auc_czech                       0.95887446
auc_no                          0.96447081
auc_optimism_corrected          0.75746917
auc_optimism_corrected_CIL      0.67636751
auc_optimism_corrected_CIU      0.83310363
accuracy                        0.91000000
accuracy_czech                         NaN
accuracy_no                     0.90544413
accuracy_optimism_corrected     0.69549915
accuracy_optimism_corrected_CIL 0.61504214
accuracy_optimism_corrected_CIU 0.77137530
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 325  25
   1  29 221

$czech
    Predicted
True   0   1
   0 202   7
   1  14  28

$no
    Predicted
True   0   1
   0 123  18
   1  15 193
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.1.2.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group",
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.600000000
lambda                          0.002245931
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.961311372
auc_optimism_corrected_CIL      0.923354885
auc_optimism_corrected_CIU      0.985361058
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.892756667
accuracy_optimism_corrected_CIL 0.838856178
accuracy_optimism_corrected_CIU 0.940785776
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0 350

$czech
    Predicted
True   0   1
   0  95   0
   1   0 209

$no
    Predicted
True   0   1
   0  66   0
   1   0 141
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.1.3 Saving results

models_summ_df_colon <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))

3.5.2 Supplementary models

3.5.2.1 CLR-transformed data

3.5.2.1.1 kNN
model="knn"
3.5.2.1.1.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               17.0000000
auc                              0.9278012
auc_optimism_corrected           0.8390623
auc_optimism_corrected_CIL       0.7635165
auc_optimism_corrected_CIU       0.9036531
accuracy                         0.8150852
accuracy_optimism_corrected      0.7496215
accuracy_optimism_corrected_CIL  0.6860774
accuracy_optimism_corrected_CIU  0.8177634
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9259427
auc_optimism_corrected           0.6312841
auc_optimism_corrected_CIL       0.5483666
auc_optimism_corrected_CIU       0.7185543
accuracy                         0.8363940
accuracy_optimism_corrected      0.6068770
accuracy_optimism_corrected_CIL  0.5126121
accuracy_optimism_corrected_CIU  0.6853063
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9701009
auc_optimism_corrected           0.8647529
auc_optimism_corrected_CIL       0.8049770
auc_optimism_corrected_CIU       0.9012555
accuracy                         0.9019608
accuracy_optimism_corrected      0.8252680
accuracy_optimism_corrected_CIL  0.8068314
accuracy_optimism_corrected_CIU  0.8466299
roc_c

3.5.2.1.1.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               26.0000000
auc                              0.9423478
auc_optimism_corrected           0.8630524
auc_optimism_corrected_CIL       0.7819441
auc_optimism_corrected_CIU       0.9295728
accuracy                         0.8515815
accuracy_optimism_corrected      0.7772840
accuracy_optimism_corrected_CIL  0.6761526
accuracy_optimism_corrected_CIU  0.8621100
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               15.0000000
auc                              0.8733486
auc_optimism_corrected           0.6517804
auc_optimism_corrected_CIL       0.5548968
auc_optimism_corrected_CIU       0.7336038
accuracy                         0.7800000
accuracy_optimism_corrected      0.6081844
accuracy_optimism_corrected_CIL  0.5233996
accuracy_optimism_corrected_CIU  0.6846266
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0.004"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               30.0000000
auc                              0.9530169
auc_optimism_corrected           0.8895200
auc_optimism_corrected_CIL       0.8250313
auc_optimism_corrected_CIU       0.9512991
accuracy                         0.8395303
accuracy_optimism_corrected      0.7630536
accuracy_optimism_corrected_CIL  0.6610833
accuracy_optimism_corrected_CIU  0.8620708
# p-value
print(paste("p_value:",mean(knn_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.2.1.2 RF
model="rf"
3.5.2.1.2.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "143"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9599069"
auc_optimism_corrected_CIL      "0.9178946"
auc_optimism_corrected_CIU      "0.9905348"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8984082"
accuracy_optimism_corrected_CIL "0.8416505"
accuracy_optimism_corrected_CIU "0.9411982"
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "37"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.7945271"
auc_optimism_corrected_CIL      "0.7586882"
auc_optimism_corrected_CIU      "0.8382909"
accuracy                        "1"        
accuracy_optimism_corrected     "0.7230647"
accuracy_optimism_corrected_CIL "0.6800637"
accuracy_optimism_corrected_CIU "0.7797127"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "237"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9177307"
auc_optimism_corrected_CIL      "0.8786343"
auc_optimism_corrected_CIU      "0.95099"  
accuracy                        "1"        
accuracy_optimism_corrected     "0.8817967"
accuracy_optimism_corrected_CIL "0.8375135"
accuracy_optimism_corrected_CIU "0.929279" 
roc_c

3.5.2.1.2.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "23"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9488154"
auc_optimism_corrected_CIL      "0.8952989"
auc_optimism_corrected_CIU      "0.9862301"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8831177"
accuracy_optimism_corrected_CIL "0.8094864"
accuracy_optimism_corrected_CIU "0.9377419"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "71"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.7772877"
auc_optimism_corrected_CIL      "0.6827278"
auc_optimism_corrected_CIU      "0.8559813"
accuracy                        "1"        
accuracy_optimism_corrected     "0.6880011"
accuracy_optimism_corrected_CIL "0.6"      
accuracy_optimism_corrected_CIU "0.770192" 
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "65"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9358754"
auc_optimism_corrected_CIL      "0.8669191"
auc_optimism_corrected_CIU      "0.9857715"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8946156"
accuracy_optimism_corrected_CIL "0.8198506"
accuracy_optimism_corrected_CIU "0.9517422"
# p-value
print(paste("p_value:",mean(rf_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.2.1.3 Gradient boosting
model="gb"
3.5.2.1.3.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9711391
auc_optimism_corrected_CIL        0.9410907
auc_optimism_corrected_CIU        0.9950271
accuracy                          1.0000000
accuracy_optimism_corrected       0.9217283
accuracy_optimism_corrected_CIL   0.8795956
accuracy_optimism_corrected_CIU   0.9639860
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                             reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.8062275
auc_optimism_corrected_CIL        0.7559881
auc_optimism_corrected_CIU        0.8528397
accuracy                          1.0000000
accuracy_optimism_corrected       0.7310713
accuracy_optimism_corrected_CIL   0.6566925
accuracy_optimism_corrected_CIU   0.8001997
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   30.0000000
auc                               1.0000000
auc_optimism_corrected            0.9643631
auc_optimism_corrected_CIL        0.9401224
auc_optimism_corrected_CIU        0.9858289
accuracy                          1.0000000
accuracy_optimism_corrected       0.9249490
accuracy_optimism_corrected_CIL   0.8934412
accuracy_optimism_corrected_CIU   0.9505495
roc_c

3.5.2.1.3.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9655809
auc_optimism_corrected_CIL        0.9237801
auc_optimism_corrected_CIU        0.9915438
accuracy                          1.0000000
accuracy_optimism_corrected       0.9095237
accuracy_optimism_corrected_CIL   0.8466613
accuracy_optimism_corrected_CIU   0.9587695
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.7993128
auc_optimism_corrected_CIL        0.7208963
auc_optimism_corrected_CIU        0.8672809
accuracy                          1.0000000
accuracy_optimism_corrected       0.7120601
accuracy_optimism_corrected_CIL   0.6299534
accuracy_optimism_corrected_CIU   0.7840918
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9632183
auc_optimism_corrected_CIL        0.9218211
auc_optimism_corrected_CIU        0.9922280
accuracy                          1.0000000
accuracy_optimism_corrected       0.9189343
accuracy_optimism_corrected_CIL   0.8639337
accuracy_optimism_corrected_CIU   0.9631778
# p-value
print(paste("p_value:",mean(gbm_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.2.2 Relative abundances

3.5.2.2.1 Elastic net
3.5.2.2.1.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.400000000
lambda                          0.001453833
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.958045499
auc_optimism_corrected_CIL      0.919974493
auc_optimism_corrected_CIU      0.979096386
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.896262270
accuracy_optimism_corrected_CIL 0.844885708
accuracy_optimism_corrected_CIU 0.933223776
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           1.000000000
lambda                          0.008597151
auc                             0.994934097
auc_czech                       0.997367216
auc_no                          0.990964266
auc_optimism_corrected          0.693086871
auc_optimism_corrected_CIL      0.599715149
auc_optimism_corrected_CIU      0.758895468
accuracy                        0.954924875
accuracy_czech                          NaN
accuracy_no                     0.936962751
accuracy_optimism_corrected     0.658614418
accuracy_optimism_corrected_CIL 0.578956424
accuracy_optimism_corrected_CIU 0.714558559
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.40000000
lambda                          0.01071588
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.92787523
auc_optimism_corrected_CIL      0.88438933
auc_optimism_corrected_CIU      0.96548937
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.86825268
accuracy_optimism_corrected_CIL 0.82293956
accuracy_optimism_corrected_CIU 0.91311501
roc_c

3.5.2.2.1.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.600000000
lambda                          0.005344515
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.941556388
auc_optimism_corrected_CIL      0.911600992
auc_optimism_corrected_CIU      0.972603351
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.892543470
accuracy_optimism_corrected_CIL 0.856537557
accuracy_optimism_corrected_CIU 0.916957058
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.05444562
auc                             0.96121143
auc_czech                       0.98359535
auc_no                          0.92307692
auc_optimism_corrected          0.71357822
auc_optimism_corrected_CIL      0.64835385
auc_optimism_corrected_CIU      0.78840887
accuracy                        0.88666667
accuracy_czech                         NaN
accuracy_no                     0.84240688
accuracy_optimism_corrected     0.65567603
accuracy_optimism_corrected_CIL 0.61790490
accuracy_optimism_corrected_CIU 0.71605550
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.03612602
auc                             0.99948536
auc_czech                       0.99919416
auc_no                          0.99978508
auc_optimism_corrected          0.90741276
auc_optimism_corrected_CIL      0.85907537
auc_optimism_corrected_CIU      0.93834169
accuracy                        0.98630137
accuracy_czech                         NaN
accuracy_no                     0.99033816
accuracy_optimism_corrected     0.84975642
accuracy_optimism_corrected_CIL 0.80744317
accuracy_optimism_corrected_CIU 0.89348916
roc_c

3.5.2.2.2 kNN
3.5.2.2.2.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9469814
auc_optimism_corrected           0.8313865
auc_optimism_corrected_CIL       0.7623059
auc_optimism_corrected_CIU       0.9127872
accuracy                         0.8418491
accuracy_optimism_corrected      0.7280742
accuracy_optimism_corrected_CIL  0.6065461
accuracy_optimism_corrected_CIU  0.8395928
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.8853123
auc_optimism_corrected           0.6461102
auc_optimism_corrected_CIL       0.5576335
auc_optimism_corrected_CIU       0.6959418
accuracy                         0.7829716
accuracy_optimism_corrected      0.5998313
accuracy_optimism_corrected_CIL  0.5435465
accuracy_optimism_corrected_CIU  0.6506473
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               27.0000000
auc                              0.8988058
auc_optimism_corrected           0.7833043
auc_optimism_corrected_CIL       0.7181715
auc_optimism_corrected_CIU       0.8345504
accuracy                         0.8470588
accuracy_optimism_corrected      0.7398395
accuracy_optimism_corrected_CIL  0.6948382
accuracy_optimism_corrected_CIU  0.7959477
roc_c

3.5.2.2.2.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.9567826
auc_optimism_corrected           0.8368374
auc_optimism_corrected_CIL       0.8155406
auc_optimism_corrected_CIU       0.8653107
accuracy                         0.8710462
accuracy_optimism_corrected      0.7147966
accuracy_optimism_corrected_CIL  0.6117988
accuracy_optimism_corrected_CIU  0.7849673
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.8806571
auc_optimism_corrected           0.7054169
auc_optimism_corrected_CIL       0.6762969
auc_optimism_corrected_CIU       0.7323169
accuracy                         0.7966667
accuracy_optimism_corrected      0.6665210
accuracy_optimism_corrected_CIL  0.6304974
accuracy_optimism_corrected_CIU  0.7159198
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9378971
auc_optimism_corrected           0.8158411
auc_optimism_corrected_CIL       0.7672457
auc_optimism_corrected_CIU       0.8676063
accuracy                         0.8688845
accuracy_optimism_corrected      0.7515841
accuracy_optimism_corrected_CIL  0.7030274
accuracy_optimism_corrected_CIU  0.8059060
roc_c

3.5.2.2.3 RF
3.5.2.2.3.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "35"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9853502"
auc_optimism_corrected_CIL      "0.9603161"
auc_optimism_corrected_CIU      "0.9975648"
accuracy                        "1"        
accuracy_optimism_corrected     "0.9357294"
accuracy_optimism_corrected_CIL "0.8963622"
accuracy_optimism_corrected_CIU "0.9711765"
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "5"        
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.8473815"
auc_optimism_corrected_CIL      "0.797293" 
auc_optimism_corrected_CIU      "0.8816524"
accuracy                        "1"        
accuracy_optimism_corrected     "0.747765" 
accuracy_optimism_corrected_CIL "0.7101008"
accuracy_optimism_corrected_CIU "0.7993964"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "29"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.956065" 
auc_optimism_corrected_CIL      "0.9327369"
auc_optimism_corrected_CIU      "0.9767077"
accuracy                        "1"        
accuracy_optimism_corrected     "0.9008109"
accuracy_optimism_corrected_CIL "0.8833349"
accuracy_optimism_corrected_CIU "0.9116247"
roc_c

3.5.2.2.3.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "29"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9666467"
auc_optimism_corrected_CIL      "0.9478243"
auc_optimism_corrected_CIU      "0.9915155"
accuracy                        "1"        
accuracy_optimism_corrected     "0.9050136"
accuracy_optimism_corrected_CIL "0.8613362"
accuracy_optimism_corrected_CIU "0.9496371"
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "5"        
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.8200504"
auc_optimism_corrected_CIL      "0.7705389"
auc_optimism_corrected_CIU      "0.8738565"
accuracy                        "1"        
accuracy_optimism_corrected     "0.7253942"
accuracy_optimism_corrected_CIL "0.68768"  
accuracy_optimism_corrected_CIU "0.7775975"
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "21"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9442324"
auc_optimism_corrected_CIL      "0.8986835"
auc_optimism_corrected_CIU      "0.98245"  
accuracy                        "1"        
accuracy_optimism_corrected     "0.8922164"
accuracy_optimism_corrected_CIL "0.8508627"
accuracy_optimism_corrected_CIU "0.9221786"
roc_c

3.5.2.2.4 Gradient boosting
3.5.2.2.4.1 ASV level
level="ASV"

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9820545
auc_optimism_corrected_CIL        0.9690812
auc_optimism_corrected_CIU        0.9938243
accuracy                          1.0000000
accuracy_optimism_corrected       0.9297256
accuracy_optimism_corrected_CIL   0.8889593
accuracy_optimism_corrected_CIU   0.9639860
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.8302454
auc_optimism_corrected_CIL        0.7760046
auc_optimism_corrected_CIU        0.8687548
accuracy                          1.0000000
accuracy_optimism_corrected       0.7523700
accuracy_optimism_corrected_CIL   0.7024771
accuracy_optimism_corrected_CIU   0.8019913
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   30.0000000
auc                               1.0000000
auc_optimism_corrected            0.9764553
auc_optimism_corrected_CIL        0.9568100
auc_optimism_corrected_CIU        0.9874606
accuracy                          1.0000000
accuracy_optimism_corrected       0.9273141
accuracy_optimism_corrected_CIL   0.8983308
accuracy_optimism_corrected_CIU   0.9509660
roc_c

3.5.2.2.4.2 Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                             reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9710847
auc_optimism_corrected_CIL        0.9503172
auc_optimism_corrected_CIU        0.9855660
accuracy                          1.0000000
accuracy_optimism_corrected       0.9064350
accuracy_optimism_corrected_CIL   0.8692260
accuracy_optimism_corrected_CIU   0.9273529
roc_c

pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.8269826
auc_optimism_corrected_CIL        0.7823150
auc_optimism_corrected_CIU        0.8818896
accuracy                          1.0000000
accuracy_optimism_corrected       0.7418934
accuracy_optimism_corrected_CIL   0.6947787
accuracy_optimism_corrected_CIU   0.7919780
roc_c

post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                             reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9680632
auc_optimism_corrected_CIL        0.9425710
auc_optimism_corrected_CIU        0.9886108
accuracy                          1.0000000
accuracy_optimism_corrected       0.9205555
accuracy_optimism_corrected_CIL   0.8830395
accuracy_optimism_corrected_CIU   0.9530963
roc_c

3.5.2.3 Saving results

models_list <- list()

for (model_name in names(supplements_models$models_summ)){
  df <- do.call(rbind, supplements_models$models_summ[[model_name]])
  models_list[[model_name]] <- df
}

write.xlsx(models_list,
           file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
           rowNames=TRUE)

3.6 Results overview

3.6.0.1 Alpha diversity

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")
Results of linear model testing ASV Richness
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -56.230 11.738 195.570 -4.791 0.000 0.000 ***
healthy vs pre_ltx - CZ vs NO 14.193 10.341 194.538 1.372 0.171 0.256
healthy vs Grouppre_ltx:CountryNO -7.489 15.169 194.358 -0.494 0.622 0.622
pre_ltx vs Grouppost_ltx 28.886 11.811 267.502 2.446 0.015 0.045 *
pre_ltx vs post_ltx - CZ vs NO 6.742 12.174 265.784 0.554 0.580 0.622
pre_ltx vs Grouppost_ltx:CountryNO -24.047 15.112 262.828 -1.591 0.113 0.203
healthy vs Grouppost_ltx -27.299 8.699 250.458 -3.138 0.002 0.009 **
healthy vs post_ltx - CZ vs NO 14.360 11.153 251.729 1.288 0.199 0.256
healthy vs Grouppost_ltx:CountryNO -31.681 14.195 247.531 -2.232 0.027 0.060
knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")
Results of linear model testing Shannon index
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.585 0.143 197.588 -4.084 0.000 0.001 ***
healthy vs pre_ltx - CZ vs NO 0.015 0.126 196.213 0.116 0.907 0.909
healthy vs Grouppre_ltx:CountryNO 0.081 0.185 195.965 0.437 0.663 0.852
pre_ltx vs Grouppost_ltx 0.364 0.158 265.003 2.307 0.022 0.056
pre_ltx vs post_ltx - CZ vs NO 0.095 0.163 263.448 0.585 0.559 0.839
pre_ltx vs Grouppost_ltx:CountryNO -0.455 0.202 260.765 -2.253 0.025 0.056
healthy vs Grouppost_ltx -0.220 0.102 247.586 -2.153 0.032 0.058
healthy vs post_ltx - CZ vs NO 0.015 0.131 248.759 0.114 0.909 0.909
healthy vs Grouppost_ltx:CountryNO -0.376 0.167 244.904 -2.255 0.025 0.056
knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")
Results of linear model testing Simpson index
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.059 0.019 199.853 -3.066 0.002 0.022 *
healthy vs pre_ltx - CZ vs NO -0.007 0.017 198.127 -0.430 0.667 0.738
healthy vs Grouppre_ltx:CountryNO 0.020 0.025 197.803 0.813 0.417 0.626
pre_ltx vs Grouppost_ltx 0.031 0.026 263.530 1.213 0.226 0.455
pre_ltx vs post_ltx - CZ vs NO 0.013 0.027 261.964 0.476 0.635 0.738
pre_ltx vs Grouppost_ltx:CountryNO -0.052 0.033 259.262 -1.556 0.121 0.363
healthy vs Grouppost_ltx -0.027 0.017 246.082 -1.594 0.112 0.363
healthy vs post_ltx - CZ vs NO -0.007 0.022 247.135 -0.335 0.738 0.738
healthy vs Grouppost_ltx:CountryNO -0.032 0.028 243.697 -1.147 0.253 0.455
knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")
Results of linear model testing Pielou index
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.057 0.020 201.130 -2.828 0.005 0.046 *
healthy vs pre_ltx - CZ vs NO -0.008 0.018 199.017 -0.446 0.656 0.781
healthy vs Grouppre_ltx:CountryNO 0.015 0.026 198.602 0.565 0.573 0.781
pre_ltx vs Grouppost_ltx 0.039 0.024 265.797 1.631 0.104 0.234
pre_ltx vs post_ltx - CZ vs NO 0.007 0.025 263.735 0.274 0.785 0.785
pre_ltx vs Grouppost_ltx:CountryNO -0.062 0.031 260.213 -2.015 0.045 0.202
healthy vs Grouppost_ltx -0.018 0.016 247.076 -1.130 0.260 0.467
healthy vs post_ltx - CZ vs NO -0.008 0.020 248.451 -0.394 0.694 0.781
healthy vs Grouppost_ltx:CountryNO -0.047 0.026 243.881 -1.828 0.069 0.206

Plots

alpha_div_plots[[paste(segment,"Country")]]

3.6.0.2 Beta diversity

Main analysis

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")
Results of PERMANOVA - robust aitchison distance
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1222.921 7.950 0.019 0.001 0.002 **
pre_ltx vs post_ltx 1 630.146 4.062 0.007 0.008 0.008 **
post_ltx vs healthy 1 1736.169 11.014 0.021 0.001 0.002 **
pre_ltx vs healthy , Country 1 738.576 4.801 0.011 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1532.675 9.879 0.016 0.001 0.001 ***
post_ltx vs healthy , Country 1 1728.746 10.966 0.021 0.001 0.001 ***
pre_ltx vs healthy : Country 1 318.719 2.077 0.005 0.325 0.488
pre_ltx vs post_ltx : Country 1 322.405 2.082 0.003 0.524 0.524
post_ltx vs healthy : Country 1 398.820 2.538 0.005 0.112 0.336

PCA

pca_plots_list[[paste(segment,"genus custom")]]

Supplementary analysis

knitr::kable(
  supplements_beta[!grepl("PCoA",names(supplements_beta)) &
                  (grepl("genus",names(supplements_beta))) & 
                    (grepl(segment,names(supplements_beta)))],
             digits = 3,
             caption = "Supplementary PERMANOVA results: Bray-curtis, Jaccard distances")
Supplementary PERMANOVA results: Bray-curtis, Jaccard distances
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 2.119 10.732 0.024 0.001 0.001 ***
pre_ltx vs post_ltx 1 1.307 6.239 0.010 0.001 0.001 ***
post_ltx vs healthy 1 3.286 17.150 0.031 0.001 0.001 ***
pre_ltx vs healthy , Country 1 2.862 14.497 0.033 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 6.618 31.603 0.049 0.001 0.001 ***
post_ltx vs healthy , Country 1 5.496 28.682 0.052 0.001 0.001 ***
pre_ltx vs healthy : Country 1 0.783 3.993 0.009 0.012 0.018 *
pre_ltx vs post_ltx : Country 1 0.562 2.691 0.004 0.112 0.112
post_ltx vs healthy : Country 1 0.998 5.252 0.009 0.002 0.006 **
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 2.225 7.844 0.018 0.001 0.002 **
pre_ltx vs post_ltx 1 1.382 4.680 0.007 0.002 0.002 **
post_ltx vs healthy 1 3.371 12.044 0.022 0.001 0.002 **
pre_ltx vs healthy , Country 1 3.013 10.622 0.025 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 6.474 21.918 0.035 0.001 0.001 ***
post_ltx vs healthy , Country 1 5.595 19.994 0.037 0.001 0.001 ***
pre_ltx vs healthy : Country 1 0.939 3.330 0.008 0.010 0.015 *
pre_ltx vs post_ltx : Country 1 0.696 2.362 0.004 0.163 0.163
post_ltx vs healthy : Country 1 1.134 4.078 0.008 0.001 0.003 **

PCA

plot_list <- supplements_beta[grepl("PCoA",names(supplements_beta)) &
                              grepl(segment,names(supplements_beta))]

ggarrange(plotlist = plot_list,
          labels=names(plot_list),
          font.label = list(size=5,face="plain"),
          ncol=2,nrow=3)

3.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(psc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect ASV","PSC effect Genus","PSC effect Phylum")),caption="Number of significant taxa")
Number of significant taxa
Count
terminal_ileum genus healthy vs pre_ltx 25
terminal_ileum genus pre_ltx vs post_ltx 4
terminal_ileum genus healthy vs post_ltx 38
terminal_ileum ASV healthy vs pre_ltx 42
terminal_ileum ASV pre_ltx vs post_ltx 2
terminal_ileum ASV healthy vs post_ltx 65
terminal_ileum phylum healthy vs pre_ltx 1
terminal_ileum phylum pre_ltx vs post_ltx 0
terminal_ileum phylum healthy vs post_ltx 3
colon genus healthy vs pre_ltx 33
colon genus pre_ltx vs post_ltx 4
colon genus healthy vs post_ltx 41
colon ASV healthy vs pre_ltx 55
colon ASV pre_ltx vs post_ltx 1
colon ASV healthy vs post_ltx 101
colon phylum healthy vs pre_ltx 2
colon phylum pre_ltx vs post_ltx 1
colon phylum healthy vs post_ltx 3
PSC effect ASV 22
PSC effect Genus 37
PSC effect Phylum 0

3.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=3,caption="Elastic net results")
Elastic net results
alpha lambda auc_optimism_corrected auc_optimism_corrected_CIL auc_optimism_corrected_CIU
pre_ltx vs healthy ASV colon 0.2 0.007 0.944 0.879 0.987
pre_ltx vs post_ltx ASV colon 1.0 0.004 0.788 0.716 0.843
post_ltx vs healthy ASV colon 0.4 0.002 0.969 0.937 0.987
pre_ltx vs healthy genus colon 0.0 0.043 0.934 0.882 0.972
pre_ltx vs post_ltx genus colon 0.0 0.052 0.757 0.676 0.833
post_ltx vs healthy genus colon 0.6 0.002 0.961 0.923 0.985

ROC - Genus level

roc_curve_all_custom(roc_cs[c(10:12)],Q="Q1",
                     model_name="enet_model")

Supplementary models

# Build final dataframe
models_list[["enet_model"]] <- rbind(models_summ_df_ileum,models_summ_df_colon)
final_df <- tibble(row_names = rownames(models_list[[1]]))

# Loop through models and extract required values
for (model_name in names(models_list)) {
  model_df <- models_list[[model_name]]
  
  # Combine AUC_optimism_corrected with its CI values
  final_df[[model_name]] <- paste0(
    round(model_df$auc_optimism_corrected, 3), 
    " (", round(model_df$auc_optimism_corrected_CIL, 3), "; ", 
    round(model_df$auc_optimism_corrected_CIU, 3), ")"
  )
}

knitr::kable(final_df, caption="All models")
All models
row_names knn_model rf_model gbm_model enet_model_ra knn_model_ra rf_model_ra gbm_model_ra enet_model
pre_ltx vs healthy ASV terminal_ileum 0.871 (0.761; 0.954) 0.947 (0.873; 0.995) 0.955 (0.902; 0.992) 0.924 (0.837; 0.972) 0.829 (0.699; 0.932) 0.975 (0.95; 0.993) 0.969 (0.947; 0.995) 0.956 (0.894; 0.996)
pre_ltx vs post_ltx ASV terminal_ileum 0.686 (0.555; 0.794) 0.856 (0.756; 0.941) 0.868 (0.787; 0.935) 0.734 (0.617; 0.834) 0.666 (0.583; 0.736) 0.901 (0.85; 0.953) 0.882 (0.825; 0.933) 0.865 (0.785; 0.932)
post_ltx vs healthy ASV terminal_ileum 0.87 (0.78; 0.953) 0.9 (0.817; 0.971) 0.925 (0.852; 0.982) 0.888 (0.818; 0.94) 0.824 (0.743; 0.909) 0.925 (0.878; 0.964) 0.915 (0.849; 0.959) 0.958 (0.9; 0.99)
pre_ltx vs healthy genus terminal_ileum 0.887 (0.8; 0.96) 0.95 (0.877; 0.996) 0.973 (0.928; 1) 0.939 (0.896; 0.979) 0.819 (0.765; 0.874) 0.967 (0.945; 0.986) 0.97 (0.942; 0.991) 0.966 (0.916; 0.996)
pre_ltx vs post_ltx genus terminal_ileum 0.727 (0.608; 0.833) 0.824 (0.719; 0.906) 0.83 (0.74; 0.912) 0.737 (0.654; 0.812) 0.718 (0.66; 0.792) 0.86 (0.825; 0.904) 0.835 (0.784; 0.889) 0.834 (0.749; 0.915)
post_ltx vs healthy genus terminal_ileum 0.921 (0.857; 0.974) 0.912 (0.821; 0.979) 0.937 (0.868; 0.983) 0.857 (0.801; 0.894) 0.801 (0.734; 0.886) 0.916 (0.847; 0.982) 0.947 (0.907; 0.983) 0.966 (0.922; 0.994)
pre_ltx vs healthy ASV colon 0.839 (0.764; 0.904) 0.96 (0.918; 0.991) 0.971 (0.941; 0.995) 0.958 (0.92; 0.979) 0.831 (0.762; 0.913) 0.985 (0.96; 0.998) 0.982 (0.969; 0.994) 0.944 (0.879; 0.987)
pre_ltx vs post_ltx ASV colon 0.631 (0.548; 0.719) 0.795 (0.759; 0.838) 0.806 (0.756; 0.853) 0.693 (0.6; 0.759) 0.646 (0.558; 0.696) 0.847 (0.797; 0.882) 0.83 (0.776; 0.869) 0.788 (0.716; 0.843)
post_ltx vs healthy ASV colon 0.865 (0.805; 0.901) 0.918 (0.879; 0.951) 0.964 (0.94; 0.986) 0.928 (0.884; 0.965) 0.783 (0.718; 0.835) 0.956 (0.933; 0.977) 0.976 (0.957; 0.987) 0.969 (0.937; 0.987)
pre_ltx vs healthy genus colon 0.863 (0.782; 0.93) 0.949 (0.895; 0.986) 0.966 (0.924; 0.992) 0.942 (0.912; 0.973) 0.837 (0.816; 0.865) 0.967 (0.948; 0.992) 0.971 (0.95; 0.986) 0.934 (0.882; 0.972)
pre_ltx vs post_ltx genus colon 0.652 (0.555; 0.734) 0.777 (0.683; 0.856) 0.799 (0.721; 0.867) 0.714 (0.648; 0.788) 0.705 (0.676; 0.732) 0.82 (0.771; 0.874) 0.827 (0.782; 0.882) 0.757 (0.676; 0.833)
post_ltx vs healthy genus colon 0.89 (0.825; 0.951) 0.936 (0.867; 0.986) 0.963 (0.922; 0.992) 0.907 (0.859; 0.938) 0.816 (0.767; 0.868) 0.944 (0.899; 0.982) 0.968 (0.943; 0.989) 0.961 (0.923; 0.985)
write.csv(final_df,file=file.path(path,"AUC_all_models.csv"),row.names = FALSE)

ROC - genus

rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs
plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
                       Q="Q1",
                       model_name=model_name)
}

p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))

p

4 Paper-ready visualizations

4.1 Alpha diversity

p_A <- alpha_div_plots$`terminal_ileum Country` +
  ggtitle("Terminal ileum")+
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

p_B <-  alpha_div_plots$`colon Country` +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

Q1_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,widths = c(1,0.1,1))

Q1_alpha

4.2 Beta diversity

pca_ti <- pca_plots_list$`terminal_ileum genus custom` 
pca_colon <- pca_plots_list$`colon genus custom` 

genus_Q1_beta <- ggarrange(pca_ti,
                           ggplot() + theme_minimal(),
                           pca_colon,ncol=3,
                           widths = c(1,0.1,1))
genus_Q1_beta

Alpha + Beta diversity

alpha_beta <- ggarrange(Q1_alpha,genus_Q1_beta,
                        ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta

4.3 Elastic net

Genus level

#models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
#names(models_to_plot) <- c("kNN","RF","GBoost","ENET")

models_to_plot <- c("enet_model")
names(models_to_plot) <- c("ENET")

# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
  plot_list_ileum[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
                       Q="Q1",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10)) 
}

roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)

# COLON

plot_list_colon <- list()
for (model_name in models_to_plot) {
  plot_list_colon[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(10:12)],
                       Q="Q1",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10))   
}

roc_curve_colon <- ggarrange(plotlist = plot_list_colon)


roc_curve_plot <- ggarrange(roc_curve_ti,
                            ggplot() + theme_minimal(),
                            roc_curve_colon,
                            ncol=3, widths = c(1,0.1,1))
roc_curve_plot

4.4 FIGURE 2

alpha_beta_elastic <- ggarrange(Q1_alpha,genus_Q1_beta,roc_curve_plot,
                        ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,0.8))
alpha_beta_elastic

alpha_beta_elastic <- ggarrange(alpha_beta_elastic,ggplot() + theme_minimal(),ncol=2,
                                widths = c(1,0.15))

alpha_beta_elastic

4.5 FIGURE3 - DAA

p_ileum <- dot_heatmap_ileum + 
  ggtitle("Terminal ileum") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")


p_colon <- dot_heatmap_colon  +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")

heatmap_plot <- ggarrange(p_ileum,
                          p_colon,
                          ncol = 2,labels=c("A","B"))

heatmap_plot 

5 Session info

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=Czech_Czechia.utf8  LC_CTYPE=Czech_Czechia.utf8   
[3] LC_MONETARY=Czech_Czechia.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Czech_Czechia.utf8    

time zone: Europe/Prague
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] mice_3.16.0              picante_1.8.2            ape_5.8                 
 [4] lubridate_1.9.3          forcats_1.0.0            stringr_1.5.1           
 [7] tidyverse_2.0.0          kableExtra_1.4.0         tidyr_1.3.1             
[10] gbm_2.2.2                doParallel_1.0.17        iterators_1.0.14        
[13] foreach_1.5.2            ranger_0.17.0            ggvenn_0.1.15           
[16] Maaslin2_1.16.0          purrr_1.0.2              pROC_1.18.5             
[19] glmnet_4.1-8             MicrobiomeStat_1.2       caret_6.0-94            
[22] openxlsx_4.2.6.1         magrittr_2.0.3           emmeans_1.10.7          
[25] lmerTest_3.1-3           robustlmm_3.3-1          lme4_1.1-35.5           
[28] Matrix_1.6-5             mgcv_1.9-1               nlme_3.1-164            
[31] pheatmap_1.0.12          reshape2_1.4.4           vegan_2.6-4             
[34] lattice_0.22-6           permute_0.9-7            ggplotify_0.1.2         
[37] ggrepel_0.9.5            ggpubr_0.6.0             MicrobiotaProcess_1.14.1
[40] phyloseq_1.46.0          ggplot2_3.5.1            tibble_3.2.1            
[43] dplyr_1.1.4              cowplot_1.1.3            readr_2.1.5             
[46] igraph_2.0.3             ccrepe_1.38.1            data.table_1.15.4       

loaded via a namespace (and not attached):
  [1] fs_1.6.4                    matrixStats_1.3.0          
  [3] bitops_1.0-7                RColorBrewer_1.1-3         
  [5] numDeriv_2016.8-1.1         tools_4.3.1                
  [7] backports_1.4.1             R6_2.6.1                   
  [9] lazyeval_0.2.2              jomo_2.7-6                 
 [11] rhdf5filters_1.14.1         withr_3.0.2                
 [13] gridExtra_2.3               cli_3.6.2                  
 [15] Biobase_2.62.0              logging_0.10-108           
 [17] biglm_0.9-3                 sandwich_3.1-1             
 [19] labeling_0.4.3              mvtnorm_1.2-4              
 [21] robustbase_0.99-3           pbapply_1.7-2              
 [23] systemfonts_1.1.0           yulab.utils_0.2.0          
 [25] svglite_2.1.3               parallelly_1.38.0          
 [27] rstudioapi_0.17.1           generics_0.1.3             
 [29] gridGraphics_0.5-1          shape_1.4.6.1              
 [31] car_3.1-3                   zip_2.3.1                  
 [33] biomformat_1.30.0           S4Vectors_0.40.2           
 [35] abind_1.4-8                 infotheo_1.2.0.1           
 [37] lifecycle_1.0.4             multcomp_1.4-28            
 [39] yaml_2.3.8                  carData_3.0-5              
 [41] SummarizedExperiment_1.32.0 Rtsne_0.17                 
 [43] rhdf5_2.46.1                recipes_1.1.1              
 [45] SparseArray_1.2.4           grid_4.3.1                 
 [47] mitml_0.4-5                 crayon_1.5.3               
 [49] pillar_1.10.1               knitr_1.50                 
 [51] optparse_1.7.5              GenomicRanges_1.54.1       
 [53] statip_0.2.3                boot_1.3-31                
 [55] estimability_1.5.1          future.apply_1.11.3        
 [57] codetools_0.2-20            pan_1.9                    
 [59] glue_1.7.0                  ggfun_0.1.8                
 [61] vctrs_0.6.5                 treeio_1.26.0              
 [63] gtable_0.3.6                maditr_0.8.4               
 [65] gower_1.0.1                 xfun_0.51                  
 [67] S4Arrays_1.2.1              prodlim_2024.06.25         
 [69] libcoin_1.0-10              coda_0.19-4.1              
 [71] pcaPP_2.0-4-1               modeest_2.4.0              
 [73] survival_3.5-8              timeDate_4041.110          
 [75] hardhat_1.4.1               lava_1.8.1                 
 [77] statmod_1.5.0               TH.data_1.1-3              
 [79] ipred_0.9-15                ggtree_3.10.1              
 [81] GenomeInfoDb_1.38.8         fBasics_4032.96            
 [83] rpart_4.1.23                colorspace_2.1-0           
 [85] BiocGenerics_0.48.1         DBI_1.2.3                  
 [87] nnet_7.3-19                 ade4_1.7-22                
 [89] tidyselect_1.2.1            timeSeries_4041.111        
 [91] compiler_4.3.1              microbiome_1.24.0          
 [93] xml2_1.3.6                  DelayedArray_0.28.0        
 [95] scales_1.3.0                DEoptimR_1.1-3-1           
 [97] spatial_7.3-17              digest_0.6.35              
 [99] minqa_1.2.8                 rmarkdown_2.29             
[101] XVector_0.42.0              htmltools_0.5.8.1          
[103] pkgconfig_2.0.3             MatrixGenerics_1.14.0      
[105] stabledist_0.7-2            fastmap_1.2.0              
[107] rlang_1.1.3                 htmlwidgets_1.6.4          
[109] farver_2.1.2                zoo_1.8-12                 
[111] jsonlite_1.8.8              ModelMetrics_1.2.2.2       
[113] RCurl_1.98-1.14             modeltools_0.2-23          
[115] Formula_1.2-5               GenomeInfoDbData_1.2.11    
[117] patchwork_1.3.0             Rhdf5lib_1.24.2            
[119] munsell_0.5.1               Rcpp_1.0.12                
[121] ggnewscale_0.5.1            fastGHQuad_1.0.1           
[123] stringi_1.8.3               ggstar_1.0.4               
[125] stable_1.1.6                zlibbioc_1.48.2            
[127] MASS_7.3-60.0.1             plyr_1.8.9                 
[129] listenv_0.9.1               Biostrings_2.70.3          
[131] splines_4.3.1               hash_2.2.6.3               
[133] multtest_2.58.0             hms_1.1.3                  
[135] ggtreeExtra_1.12.0          ggsignif_0.6.4             
[137] stats4_4.3.1                rmutil_1.1.10              
[139] evaluate_1.0.3              nloptr_2.1.1               
[141] tzdb_0.4.0                  getopt_1.20.4              
[143] future_1.34.0               clue_0.3-65                
[145] coin_1.4-3                  broom_1.0.7                
[147] xtable_1.8-4                tidytree_0.4.6             
[149] rstatix_0.7.2               viridisLite_0.4.2          
[151] class_7.3-22                aplot_0.2.5                
[153] IRanges_2.36.0              cluster_2.1.6              
[155] timechange_0.3.0            globals_0.16.3